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ABSTRACT 


STRAETER, TERRY ANTHONY . On the extension of the David.on-Broyd.en class 
of rank one, quasi -Newton minimization methods to an infinite di me nsional 
Hilbert space with applications to optimal control problems . (Under the 
direction of HANS SAGAN). 

The various elements of the class of rank one, quasi -Newton mini- 
mization methods are distinguished by the manner in which a particular 
parameter is chosen at each iteration, lor various choices of this- ' 
parameter, conditions are found which guarantee that the algorithm’s 
iterates converge to the location of the minimum of a quadratic func- 
tional. Also, conditions are found under which the iterates generated 
by the Davidon-Fletcher-Powell method, the method of conjugate gradients, 
and the rank one, quasi-Newton method with a particular choice of the 
parameter are the same. An idea for minimizing a function by a rank 
one, quasi-Newton method due to Powell is extended to infinite dimen- 
sional Hilbert spaces. Also considered is a modification of the rank 
one, quasi -Newton methods in order to minimize a functional subject to 
linear constraints. Conditions are found which guarantee the convergence 
to the location of the constrained minimum’ of a 'quadratic functional. 

The application of these rank one, quasi-Newton algorithms to various 
classes of optimal control problems is investigated. Also, the 
algorithms are applied to a sample, optimal control problem. The results 
are compared with the results for the same problem using other known 
first-order minimization techniques. 
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1. INTRODUCTION 


1.1 Background and Preview 

In the past few years the problem of finding the location of 
the minimum value of a real valued function of n real variables 
by numerical methods has been the subject of a great deal of 

i , 

research [j,10,llj. Several iterative procedures have been developed 
to solve the problem. Much of the work has been directed toward 

* t l 

developing algorithms which use, the function value and its gradient 
to locate the minimum by ' iteration . This type of algorithm is usually 
referred to as a gradient or first-order method. Historically the 
method of steepest descent was the first such method. In order to 
accelerate convergence the method of conjugate gradients was devel- 

4 

oped later by Hestenes and Stiefel fl9) and then was applied to the 
minimization problem by Fletcher and Reeves Later first-order 

methods were developed which were inspired by Newton 1 s second-order 
method. 

Two of the most effective of these techniques are due to Davidon . 
In 1959 (V] Davidon proposed two techniques for solving the problem. 
The first method, hereafter denoted by Dl, was given in the main body 
of his report. In 196h Fletcher and Powell [1 q] modified D1 and 
established that for any real valued function the method is stable, 
that is, does not diverge. (This modified D1 we will denote by DFP.) 
Moreover, they showed that for a real valued quadratic function of 
n variables, the DFP algorithm converges in a finite number of steps. 

l 

In fact, at most n + 1 steps are needed. In 1968 Myers Q57] showed 
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the relationship between the search directions of the DFP method and 

those of the conjugate gradient method if the function to be minimized 

is a quadratic function of n variables. Also in 1968 Horwitz and 

Sarachik 0 °) extended the DFP method from an n dimensional 

Euclidean vector space to an infinite dimensional, real Hilbert space 

and established convergence of the iterates when the functional to 

be minimized is quadratic. The result due to Myers was also extended 

to any real Hilbert space. In 1970 Tokumaru, Adachi, and Goto |j5<0 

also extended the DFP algorithm' to. an infinite dimensional, real 

Hilbert space and gave a comparison of the DFP method, steepest descent 

* 

and the conjugate gradient method on some sample optimal control 
problems . 

The second method due to Davidon, denoted herein by D2, was 
outlined in the appendix to the 1959 report Qfj . Later in 1968 00 
he published a modification of the second method and established 
conditions insuring its convergence to the minimum of a quadratic 
function of n variables in a finite number of steps and insuring 
the stability of the method. In 1 969 090 Davidon proposed a second 
modification of the second method. In 1967 Broyden JXj proposed a 
family of methods based on a parameter a the choice of which was 
‘left unspecified. If a - 1, then under certain conditions, Broyden* s 
method and the second Davidon method, D2, are the same. In 1 969 
Goldfarb [ 13 ] established convergence of the iterates of the Broyden 
algorithm for a class of real functions of n variables when a is 
chosen by means of a linear minimization technique (i.e., 
dimensional search) . 


a one- 
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The purpose of this paper is to extend the Davidon-Broyden family 
of algorithms to an infinite dimensional real Hilbert space, to estab- 
lish conditions guaranteeing convergence ‘of the iterates for various 
algorithms in the family, and to apply the family of algorithms to 
optimal control problems. 

In chapter 2 of this paper, the Davidon-Broyden family of 
algorithms is extended from an n-dimensional Euclidean vector space 
to an infinite dimensional real Hilbert space. In the case of a 
quadratic functional defined on a real Hilbert space, conditions are 
given which guarantee convergence of the iterates to the location of 
the minimum for Goldfarb’s method of choosing the parameter and for 
a far more general choice of the parameter. In this approach the 
need for a linear minimization is eliminated. 

In chapter 3, the relationship between the Davidon-Broyden 
algorithm with Goldfarb 1 s method for choosing the parameter, the DPP 
method, and the method of conjugate gradients is examined. Also 
conditions are given which insure that all three methods generate 
the same search directions. Since the step size is chosen the same 
way for each method, the same sequence of iterates 'is generated. 

-In chapter 4, a modification in the method of choosing the 
search directions in the extended Davidon-Broyden algorithm is 
examined. This modification was suggested by Powell in 1970 Q 50 J in 
an article reviewing the state-of-the-art for finite dimensional 
optimization. Por this modified method, conditions insuring con- 
vergence of the iterates to the location of the minimum of a quad- 
ratic functional are given. 
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In chapter 5, the "basic algorithm -as given in chapter 2 is 
modified so that it can he applied to a constrained minimization 
problem. The constrained problem is to find the location of the 
minimum of a functional J(x) defined on a real Hilbert space H, 
finite or infinite dimensional, subject to the constraint that 
Ax = b, where b is a fixed element of another real Hilbert 
space H and A:H -» H, is a bounded linear operator. 

The mechanics of applying the algorithm to various classes of 
optimal control problems are examined and discussed in chapter 6. 

In many optimal control problems, only controls lying in a subset 
of the Hilbert space are considered. For example, those L^Q), 
functions whose range is contained in U, a compact, convex subset 
of RP. However, the, basic algorithm discussed in chapter 2 updates 
the new estimate of the location of the minimum based only upon the 
functional's value and its gradient at the old estimate. The new 

i • 

estimate can then lie anywhere in the Hilbert space. Because of this, 
to apply the basic algorithm to an optimal control problem, its con- 
trol region TJ must be an Euclidean space. Park (j20 has examined 
various classes of optimal control problems with a compact, convex 
control region and by means of certain transformations has reformu- 
lated these problems so that their new control region is an Euclidean 
space. The equations necessary to apply this basic algorithm to these 
transformed problems are also derived in chapter 6. 

In chapter 7> the basic algorithm and its modification are applied 
to one of the sample control problems given by Tok umar u et al. The 
results are summarized and compared. The results given by 
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Tokumaru . et' al. £36 J comparing the conjugate gradient, steepest 
descent, and DPP methods for the same problem are presented. The 
Tokumaru- et_ al_. results show the DPP method superior in terms of 
rate of convergence. The DFP method is then compared with our rank 
one algorithm’. 


1.2 Outline, of. Known Methods 

Let H denote a real Hilbert space with- the inner product 
( , ). Let R denote the real numbers. A functional J:H -» R 

is said to be differentiable at x if there exists a ‘linear func- 
tional Ujj-jh -» R such that for h e H 


J(x + h) - j(x) = u x (h) + e-j-(h) (l) 

€i(h) • - 

where --■■■■ — >0 Jh| -AO (Preehet differential). If such a 

ll ^ ll 

functional u x exists, then it is unique £33^}* Moreover, by the 
Riesz representation theorem there exists a g(x)eH such that 
(g(x),h) = u x (h) for all h e H and g(x) is given by 


dj(x + th) 
dt 


t=0 


(g(xj,h) 


( 2 ) 


We call g(x) the gradient of the functional J. 

Suppose we wish to find the location of the minimum value of 
a differentiable functional J:H -> R with gradient g(x) at each 
point x. The three iterative techniques, steepest descent, conjugate 
gradients, and DFP, could be applied to finding the location of the 

* e 

minimum of J.' These algorithms are all descent methods and are 
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only distinguished from each other by the manner in which, the search 
direction is computed. If x Q e H is the initial estimate of the 
minimum and i = 0, the algorithms are as follows: 

Step 1 : Compute J(xi) and g(xi); if ||g(xi)| = 0 stop, 

otherwise, 

Step 2 : Let Xj_+i = Xj_ + Oj_Sj_ where c called the step size, is 

a real number and s^ € H is called the search direction, is 

chosen so that j(xj_ + a^) < j(xi + Asi) for all A e R. The 
search direction Sj_ for the above-mentioned methods is chosen in 
one of the following three ways. 

If = - g(xj_), then the algorithm is the classical method 

of steepest descent f 25 j . 

. (gC^gC^)) 

If Si = - g( Xi ) + Pi^Si.i where p ±-1 = — _ 

(s(x 1-1 ),gCx 1 _ 1 )) 

and s 0 = - g(x Q ) then the algorithm is called the method of con- 
jugate gradients [ll, 18 , 19 , 23 , 25 , 34] . 

Finally we have the-IFP method, if s-j_ = - H^^'g(xi) where 
the H H i = 0,1,2 ... are a sequence of linear operators 

defined iteratively as follows:. is a strongly positive, linear, 

self-adjoint operator' on H and H (i+1) _ H (i) + A (i) + c (.i) 
and C^); H -> H are so that if x e H 



(H^Vj,x) 

(jijHVi) 



where 
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■where 


and 


Where 


y± = g(x i+ i) - g(xi) ■ 



(gj,xj 

(hi,7i) 


°i 


a ± = *1+1 - x ± 

We set i + 1 = i, return to step 1, and continue. 

A summary of the results known concerning, the application of 
these three techniques to quadratic functionals -trill he given at the 
end of the next section. 


1.3 Quadratic Functionals 

Let A:H -» H he a linear, self-adjoint operator such that 


m | x jj 2 < (x,Ax) < M || x jj 


(3) 


where 


M = 


sup (x,Ax) 

x^e 


m = 


inf (x,Ax) 

x^e 


XII 2> “ II x ri 2 


(4) 


and where we assume that 0 < m < M. Hence, j| A || = M jjlf) . 
Since m- > Q, exists and A“^ is also self-adjoint. 

Moreover, we have 


H II x H 2 < U-A" 1 *) <1 1 X 1 2 


(5) 
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We call the functional J: H R given by 

j(x) = J Q + (x,b) + |(x,Ax) ( 6 ) 

a quadratic functional on H where h is a fixed element in H 
and J Q € R. Using ( 5 ) we can compute the gradient g(x) .of the 
quadratic functional given hy ( 6 ) as follows: 

dj(x + th) d(j Q + (x + th/b) + Jf(x + th,A(x + th))) 
dt : dt ' 

d(j Q ) (h,b)d(t) d(x,h^ 
dt + dt + dt 

|d[(x,Ax) + 2fc(h,Ax) + t (h,Ah)] 
dt 

= (h,b) + (h,Ax) + t(h,Ah) 


and we have 


dj'(x + th) 
cfET 

t =0 


(h,b + Ax). 


Therefore, by (2), the gradient g(x) of the quadratic 
functional j(x) is given by 

g(x) = b + Ax. ( 7 ) 

The following well known theorem states a necessary and sufficient 
condition for x to minimize j(x): 
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Theorem 1.1 ; A necessary and sufficient condition that x minimizes 
j(x) as given by (6) is that g(x) = -0 where 0 denotes the zero 
element of H. 

Proof: Suppose g(x) = 0 then Ax + b = 0 by (7) so that 

b = - Ax, hence if x f x 

J(x) - j(x) = Jo + (x,b) + |(x,Ax) - Jo - (x,b) - |(x,Ax) 

= (x, - Ax) + |(x,Ax) + (x, Ax) - i(x,Ax) 

since 

b = - Ax 

Therefore, j(x) - J(x) = - i(x - x,A(x - x)) since A = A* and 
( (x - x),A(x -x)) > mj| x - x|| 2 > 0 by (3). Hence, 
j(x) - J(x) < -i m|Jx - x|( 2 < 0. So 3 1 is the location of the 
minimum of J. 

Conversely let us suppose that j(x) < j(x) for all x e H, 

If we let h e H, h fixed, then for t e R we have 

j(x + th) - J(x) > 0. (8) 

Hence, 

0 < J 0 + ( x + th,b) .+ i (x + th,A(x + th)) - J Q - (x,b)- i(x,Ax) 

= t(h,b) + t(hjAx) + rjt 2 (h,Ah) 

= tgh,g(x))+ |-t(h,Ah[] < t(h,g(x) ) + ^||h|| 2 
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How suppose (h,g(x)) < 0; then since M, | h # ^ and (g(x),h) are con 

2 

stants and M > 0, J| h || ^ < 0, we can force t(h,g(x)) +— M |] h | ^ .< 0 
by letting t -» 0 + . So this would imply’ j(x + th) - j(x) < 0 which 

A/ 

contradicts (8). Si m ilarly, if (h,g(x)) > 0 by letting - t -» 0" we 
have t(h,g(x)) -4* M jj h j| ^ <0 which leads to a contradiction to (8) 

* . <M 

Hence, it must be true that (h,g(x)) = 0, and since h was an arbi- 

ro 

trary element in H it follows that g(x) = 0. 

Theorem 1.2 ; If x denotes the location of the minimum of the quad- 
ratic functional J given by (6) then 

x = - A** 1 !. (9) 

Moreover, if x,h e H are such that x + h = x then 

h = - A -1 g(x) (10) 

Proof : By theorem 1.1 and (7) 0 = g(x) = Ax- + b, so that x = - A -1 b 

since A -1 exists. If x + h = x, then g(x + h) = g(x) = 6. So, 

A(x + h) - b = 9, Ax + Ah = -b. Hence, Ah = - (Ax + b) = g(x) by 
(7). Therefore, h = - A"-*-g(x). 

Of course, the equation h = - A” 1 g(x) • is the basis for the well 

t 

known Newton -Raphson method for minimizing a functional on a Hilbert 
space [22] • 

Other useful results due” to the fact that J is a quadratic 
functional are the following: If x,x* s H; then 

A“ 1 (g(x) - g(x*)) = A _1 (Ax + b - Ax* -b) ' = x - x*. (ll) 
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Hence, if we let ' y = g(x) - g(x*) and ct = x 


* 


we have 


A _1 y = a. 


( 12 ) 


Moreover, we can. see that s e H and a e R are such that 

x* = x + ccs (13) 

then by (7) and (13) g(x*) = Ax* + b = Ax* + b + aAs. . So that by (7) 

we have again 

g(x*) = g(x) + aAs. (lh) 


Also, for all x Q e H the smallest closed, convex set containing the 
points x e H at which J(x) < J(x 0 ) is bounded £23] . We denote 
this set by = convex e H:j(x) < J(x Q )} • 

It is known (^ 20 j that if a quadratic functional is minimized by 
the conjugate gradient method the ith search direction is given by 


= - iig(xi)|{ 2 


g(xz) 


^0 || S(xz>|| 


( 15 ) 


Horwitz and Sarachik have shown for a quadratic functional that 
the ith search direction of the DFP method is given by 


s 


i 


- H(°)(g( Xi ),H (l) 



1 = 0 


g(*l) 

(g(x z ),H(°)g(x z )) 


(16) 


If is the identity denoted by I, then (15) and (16) are the same 
directions. Since the step size is picked in the same fashion for both 
methods they will generate the same sequence of iterates [j2cT| . 
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The convergence of the iterates to the location of. the minimum 

( ' 

by the method of steepest descent, method of conjugate gradients, and 
the DFP algorithm has been established p°] [36] for the case 'where 

the functional to be minimized is quadratic. 

A note concerning the notation to be used throughout this paper 
would appear to be in order. It shall be our practice that if refer- 
ence is made to an equation, identity or relation in the same chapter, 
only the number at the right-hand side of the page will be; enclosed 
in parenthesis. However, if the reference is to an equation, etc.,' 
in another chapter, then the chapter number followed by a period -and then 
by the reference number will be given. Theorems will be numbered 
sequentially with a chapter prefix, that is, as theorem 1.1, and 
trill be referenced in that fashion. The numbers enclosed in sq uar e 
brackets refer to the references in chapter 8. 

Also herein we shall denote by L^[t 0 ,ti] the real Hilbert 
space of Lebesque measurable functions u = u(t)- defined on 
Ev tq] with range in R r (Euclidean r space) such that 



where -u, (t), ^(t), ... u r (t) are the components of u. 
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2. THE CLASS OP MVIDOW -BROYDEN ALGORITHMS 

In this chapter, we shall discuss' the extension to an infinite 
dimensional Hilbert space of the Davidon-Broyden minimization algorithms 
alluded to in chapter 1. We shall also relate conditions insuring the 
convergence of the iterates of various members of this family of algorithms 
in the case where the functional to be minimized is quadratic. In the 
case of a finite dimensional Hilbert space, Broyden Qi-^J called this 
family of algorithms "quasi-Hewton methods." Special cases of this 
family have been called "optimal variance algorithm" by Goldfarb [ 13 ] 
and "rank one variance algorithm" by Davidon j^9^j • The author's contri- 
bution is to show the relationship of these methods to each other, to 
extend their applicability to infinite dimensional real Hilbert spaces, 
and to establish conditions insuring convergence of the iterates. Por 
the latter purpose, new proofs of convergence of the algorithm's various 
manifestations, were necessary. 

2.1 Outline of the Class of Algorit hms 

■ * 1 

Let J:H -»R .be a differentiable functional with gradient g(x) . 

* 

Let x Q e H be the initial estimate of the location of the Trrinimnm of 

(0) 

J, and let V be a self-adjoint, strongly positive linear operator 
from H onto H. Let M Q > m Q > 0 be such that m Q I < V®- 0 ' 1 < M Q I. 

If J, the functional to be minimized, is quadratic as in chapter 1, 
then V K } is an estimate of A . We compute j(x D ) and g(x Q ) and 
and obtain the first iteration as follows: 
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Step 1 : Let 

x * “ % - «nV( n )g n (1) 

where g^ denotes gC^) and is a scalar, the choice of •which 
is discussed later. Let 

s n = - (2) 

and compute J(x*) and g(x*) denoted by g*; if ||g*|| = 0, a 
necessary condition for x* to be the location of the mini mum, we 
stop. If J is a quadratic functional and g* = 8, then by theorem 
1.1 x* is the location of the minimum. 

Step 2 ; Compute the residual vector 

r n = V (n) g* - V (n) -e^ + (3) 

that is, 

r n = V^ n ^(g* - (1 -^Sn) (4) 

or 

r n - v (n) y n - <%s n (5) 

where jfo" g* - g n . If r n = 9, then set 0 ^ = 1 and return to 
step 1. 
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Step 3; Define scalars 


Rn ( 


(6) 


and 


and let 


7n = - 


(^nJ r n) 

Pn 


(T) 


r 


?n 



1 + 7. 


n 


1 


if 7 n I - 1 
If 7 n = - 1 


(8) 


Step 4-: Let 


v (n+l) =v (n) + (7^-1) rfn ) (9) 

Pn 

■where :H -» H is defined such that for al l x e H 

B^ n) x = (x,r n )r n . (10) 

Step g: If J(x*) < J(x n ) , let x n _j_q = x* and, consequently, 

J ( x n+l) “ JO**) and g n+ i = g*; otherwise, let x n+l = x n so "that 
^(■^n+l) = '^( x n) Sq+i = Sq* Set n = n + 1 and go to step 1. 

The elements of the class of algorithms outlined above are distin- 
guished by the manner in which the parameter Oq is chosen with each 
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iteration. Davidon (JET], Broyden QQ, and Goldfarh [lj] proposed tech- 
niques for choosing cc^ in the finite dimensional case. For Davidon ' s 
r.ank One variance algorithm 0 ^ = 1 for all n, however j the scalar 
A n given hy (8) is chosen so that certain inequality constraints are 
satisfied. These constraints insure that Davidon' s remain 

positive definite. Goldf arb 1 s optimal variance algorithm required that 
"be chosernso that J(x n + as n ) he minimized with respect to a. 

The Broyden quasi-Newton method requires only that he chosen so 

that ( ^ exists. For a quadratic functional theorem 2.7 proved 

t 

later shows that for ■ or < A - ^" either Davidon 1 s or 

Goldf arb * s, method of choosing a> n satisfy Broyden 1 s criteria.. 

For the remainder of chapter 2 , we shall assume that the functional 

i 

to he minimized is ■ quadratic as defined in section 2 of chapter 1. We 
shall make note of any results which are independent of the type of, 
functional to he minimized. 

2.2 Theorems That Are Independent of the Choice of o^. 

Theorem 2.1: B^ as given in (10) is a self-adjoint positive 

operator for all n_, for any choice of c^. 

Proof: If x € H, then 

(x,B( n )x) = (x,(x,r n )r n ) = (x,r n ) 2 > 0 

and if x,y e H, then 


(x,B( n )y) = (x,(y,r n )r n ) = (y,r n )(x,r n ) = (y,-(x,r n )r n ) = (y,B( n )x) 
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Theorem 2.2: 



is self-adjoint for all n, for any choice of 


«n* 


n-1 


proof: V 


(n) 


= v(°) + Y 


(Al ' 1} n(i) 


i=0 P± 


-by (9)^ and is self- 


adjoint by definition. By the above theorem, the B^)'s are self- 
adjoint and the finite sum of self-adjoint operators is self-adjoint. 

Notice 1 that the two theorems proved above are independent of the 
type of functional that is to be minimized. 

We have seen in chapter 1 that the location of the m-i-n-imnm x 
of -a quadratic functional is given by x = x n - 1 A" 1 g n . Also recall 
from chapter 1 that the change in x from one iteration to the next 
for the Newton Raphson method is given by -A“ 1 g n . In the algorithm 
outlined in section 1, the change is hence, the n ame 

quasi-Newton was given to the finite dimensional form of these 
algorithms' by Broyden [V], The search directions for the algorithm 
outlined in section 1 are given by -V^g n and we want to 

play the role of A "K Hence, it is desirable that the sequence of 
operators retain from one iteration to the next the following 

property: if for some u £ H, A“^u = V^u then A"-^! = v( n+1 )u. 

By the definition of the vector r n we have the following general 
result. 

Theorem 2. g : I-f u e H is such that A _1 u = V^u and B:H ■=* H 

is a linear operator such that B - V^ 11 ) = pB^ for some real p 
then A~\>. = Bu. 

Pz^of : Since A^Cg* - 8a) = x* - ^ by (l. 12) and x* = x n - g n 

by definition, then 
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x* - *n = -ctfcV^g^ = A^g* - g n ) 


( 11 ) 


and by ( 3 ) 


r n = v ^(§* - Bn) + OfcV^gjj. 

Therefore, r n = V^ n )(g* - g^) - A _ 1 (g* - g n ) by (11) and hence 


r „ - ( v(n) - A" 1 )^ - %)■ 


(12) 


r\, v ( n ) 


Since A u = V u we have 


So 


( r n’ u > 


(v^ n ) - A -1 )u = 6 


= [{v^ - A~ 1 /(g* - g^),nj = ((g* - g n ),^ n -) -A" 1 )u 


(15) 


= (g* - gn)e) =0 


by theorem 2.2 and equations -('12.) and ( 13 ) . Hence, the hypothesis 
(B - A” 1 )u = pB^V implies 


pB^u 


|x(ujr n )r n 


p * 0 • r n “ B 


Therefore,. 


Bu 


=* A 
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r (n+i) = v (n) A ^ An ~ ^ u(n) 

Pn 


Since V 1 - V v v + B Vi±/ we have the following: 


'm 

• Corollary 1; if V> n ^u = A~\i for some u e H, then v^ n+1 ^u = A“ 1 ii. 
In chapter 1 we showed that for a quadratic functio nal 


A“ 1 (g* - g n ) = x* - x n . 

The following theorem gives the fundamental reason for our choice of 
^ ) that is, so that when 7 n £ - 1, then v^ n+1 ' 1 and A“^ will 
agree on the space spanned by g* - g^. 

Theorem 2.k : (Basic theorem). If 7 n £ - 1, then 


v (n+ 1 )( g * _ _ x * _ Xn 

that is, 

Y( n+1 V„ - ^ 

I 

Proof: if .r n = 0 then by (10) is the zero operator, therefore 

y( ) = v( ), so that y( n+ l)y- n = ccn s n* Otherwise, consider 

(Toy (9)) 

<V n Cby (3)) 

("by (2)) 

Cby (6), (7)) 


= r n 0 = e 


v(^)y n - %Sn = V<“)y n + 


( r n>yn^ r n “ a n s n 


n 


r n “ V^Sn + "p ^ (Wn) r n 

H n 


' r nj^ + — 


r n <l + 




Pn 


(Pn + ^nPn)} 


(iy (8)} 
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Notice that the basic theorem is independent of the fact that 
J is a quadratic functional. The following corollary combines 
theorems 2.3 and 2.4- to show that each iteration, if 7 n ^ 
raises the dimension of the subspace, on which and A" ^ agree, 

by one. Hence, some authors fioj have calle'd the finite dimensional 
form of this algorithm a rank one method. 

Corollary 1: (Fundamental property of = cc^s^ for all 

i< n if 7.^-1 ;for' 'j = 0,1,.. ..,n 

u- 

Proof: (By mathematical induction) 

= a o^o (by theorem 2.4-) 

Assume y( n )y^ .= for all i < n. Consider v(h+l)y^ for 

i = n. Then by theorem 2.4-, Otherwise, for i < n, 

since A "Vi = by (1.12) and v( n Vi = o^, A" 1 and V^ 11 ) 

agree on y^. The corollary to theorem 2.3 implies V^ n+ ^^y^ = ct^s^. 

The above corollary is most useful in later convergence arguments 
and, hence, we have named it "the fundamental property of V^ n \" 

In order to facilitate the proof of some later results, we shall 
now find another way of expressing (A n - l)/p n . 

Theorem 2.3? If - 1> then 


(Aj - 1) 
Pi- 





-1 


( r i' y i)" J 



Proo^ (Ai - 1)/ Pi = (7 i /(7 i + 1) - 1)/^ = - { Pi (7i + l))" 1 

= -( p i -( r i , g i ))“ 1 = - ((^ g *) - ( r ^ g .))" 1 = - ( r i , y i )- 1 


In view of this, (9) can he written as 


v (n+l) = y (n) bW 

(v( n) y n - Vn , y n ) 


and since 


ct n s n A y n 


, we have 


v(n + l) . v (n) _ £Z , 

((v( n ) - A-^y^yn) 


which yields the following theorem: 

Theorem 2 . 6 : If v^°) > A" 1 , then v( n ) > A" 1 for all n and 

similarly , if V^ 0 ) < A -1 , then v( n ) < A" 1 for all n. 

Proof : We proceed hy induction and assume that y( n ) > a - 1 . If 

y(n+l) _ y( n ) , i.e. , 7 n = - 1 , the result is trivial. Otherwise, 
hy (15) and (10), 



(x,(V (n+1) - A' 1 )*) = (x,(y (n) ‘ - A‘ 1 )x) - 
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, 1 ( ( (n) -1^ ^ (^(^ n) - A" 1 )^) 2 

From the C.B.S. inequality 2 : IxjV - A )xj - -— 7— — r — — 

(y„,(v< n ) - A-i)y n ) 

The second part of the theorem is obtained by merely considering 
(x^A - ^- - v( n+ -^)x) instead. 

The following theorem gives a condition under which the vC 31 ) ' s 
form a monotone sequence of self-adjoint bounded linear operators. 
Theorem 2.7: If v(°) > A -1 , then vC n ) < vC 31-1 ) < ... < v(°) for 
all n. Similarly, if v(°) < A" 1 ,, then v( n ) > v( n_1 ) > ... vic- 
tor all n. 

Proof: By theorem 2.6, if > A -1 , then iK 11 ) > A“ ^ for all n. 

If V^ 31 * 1 ) = v( n ), i_._e. , 7 n = - 1, then the assertion is obvious. 
Otherwise, we have 


> 


(x/v( n+1 ) -vWlx^B- 


x.B^x 


- A“ X jy n 


< 0 


by (15) • The inequality holds since theorem 2.1 gives ^x,B^ n ^xj > 0 
and from theorem 2.6 - A"^ > 0. The second part of the theorem 

follows by considering - y( n+ -^) instead. 

Corollary 1: If < A - "*" or > A~^, then the V^'s form 

a monotone sequence of strongly positive self-adjoint linear operators 
bounded by and ,A - ^. Moreover, there exists a strongly positive’ 

' Cn'l . 

self-adjoint operator V such that lim V v 'x = Vx for all x e H. 

n— > «> 

Proof:. The . V^ n }’s form a bounded monotone sequence of strongly 

f 4 \ 

positive, self-adjoint operators by theorems 2. 2. and 2.J. That is, if 
V-(2) < A ~l^ have y('o) < V (I) < y(2) < . . . < v(n) < ... < A _1 . This 
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implies the existence of a strongly positive, self-adjoint lin ear 

operator V such that y( n ) converges to V pointwise [jL^j . 

Theorem 2.8: If v(°) < A' 1 or V^ 0 ) > A -1 and 7 n ^f - 1 for 

all n and if S is the closure of the space spanned hy then 

lim y( n )x = A"^x for all x £ S independent of the choice of the 
n— > oo 

Oq's. (By closure of the space spanned hy a set M, we mean the 
smallest topologically closed subspace containing M. ) 

Proof : For any x e S there exist pi e R such that 

CO 

x = X PiYi * (l6) 

i=0 


Consider 


CO 



1=0 



< 

( A_1 - T<n) ) \ 

•f 

CO 

(A-l.y (n >)^ e iyi 


1=0 


i=n 


n-1 


By the corollary to theorem 2.k, (a** 1 - v( n )) ^ piyi - 0. Since 


i=0 


v (o) > A -1 or v (o) < A -1 ^ theorem 2.7 and its corollary, it must 
he that []v( n )|| < Ha- 1 !! or < ||?(°)||. So Ha** 1 - vC n )|| is hounded 
for all n, and by (l6) it follows that the remainder must go to zero 


CO 

Z Pi ^ 

0 as n -7 co. so we have lim 

Ia"^ - v( n )x 

i=n 

h-=MX) 

1 



y 
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Corollary: If < A - ' 1 ' or > A - '*' and 7 n £ ~ ^ or n 

and the y^ form a "basis for H, then v( n ) -> A“^- point -wise 
independent of the choice of ct^. 

Notice that all these results have "been established without 
regard to the choice of a^. "We called r n as defined in (3) and (4-) 
a residual vector. The reason for this terminology will now he 
explained. 

Suppose r n = 6 for some n. Then V^ n Vn “ oc^s-^ and 
if (V 11 ))- 1 exists, y n = ctnj V n i)’V n ^ = - ot^ hy (2) and by 
(5) we have y n = g* - §n = - o^. By (1.1*0 g* = g n + a^As^. 
Therefore, a n As n = - a^g^ so that s n = - A -1 g n . Hence, since 
s n =-v( n ^g n we have = A _1 g n . 

As we have seen in chapter 1 (theorem 1.2), the minimum of J 
is attained by x = x n - A“ g^. In the basic algorithm outlined in 
section 1 of this chapter, step 2 says if r n = 0 we let 0^=1 
and repeat step 1. Then the new x* is x* = x^ - V^ n ^g n and we 
have shown above that s n = - A“ 1 g n , hence, V n g n = A“ 1 g n . Therefore, 
by theorem 1.2 x* is the location of the minimum of J. This 
explains the reason for step 2, and we have proved the following: 
Theorem 2.9: If r n = 9 and exists, then by applying 

step 2 of the basic algorithm we let oc^ = 1 and we find that the 
resulting x* given by x* = x^ - V^ n ^g n is the location of the 


minimum of J. 
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2.5 Convergence if a n is Chosen by a 
One Dimensional Minimization Process 
There are two rather obvious ways to choose Oq at each step: 

(l) let (% = 1 for all n, and (2) let cx^ be such that 

J(x n + + As n ) for all real A.‘ Both cases have been 

investigated by Davidon ,and Goldfarb and convergence has been established 

in the case of a quadratic functional on a finite dimensional Hilbert 

space. 

We shall now demonstrate the convergence of the algorithm of 
section 1 to the location of the minimum of a quadratic functional 
on an infinite dimensional Hilbert space when is chosen for 

every n so that 


J(*h + a^Sn) < + As n ) 


(17) 


for all real A. This, of course, implies that = x* in step 5 

of the algorithm given in section 1. If is chosen in this manner, 
then, by necessity. 


dj(x n + As n ) 
dA 


(18) 


at A = oc^. 

That is, (g*;^) = (§(x n + a !n s n ),s n ) =0 so that from (1.7) 
we have 


■ a ( s n* §n) 

n (s n ,As n ) 


( 19 ) 



(lay def.) 


Therefore, 


j( x l) = J 0 + O^X-l) + ICx^Ax^l) 


J + ("b,x + as) + -(x^ + a s ,A(x + as))- 
o o o o 2 o o o ' o o o 


J c + (”b,x 0 ) + i<x 0 ,Ax 0 ) + a o r(s 0 ,h) + (s Q ,Ax 0 [] + ^(s 0 ,As c 
2 d 


J(xj + a (s ,g ) + ^2-(s .As 1 
' o o' o' o' 2 o o' 


(hy (1.6) 


j(x Q ) - i (S °" g ° )2 


2 (s o ,As 0 ) 


("by 19) 


In general, 


J( x n+l) = J(*o) " 


( S jj §j) 


Since, . inf . J > - co and j(x n+ T_) < j(x n ) , it must he that 


n 


lim J^+i) = J(x 0 ) - lim 
n-* <» . n*-» <» 


i 


( s i.Si) S 

2( s i> As i) 


00 


so that 


00 
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which implies that hy necessity 

lim = 0 (20) 

i-»“(s.,As.) 
x l 

Since its derivation in no way depended on (2), (20) must "be true 
for any descent method. This result and the following lemma are given 
hy Horwitz and Saraehik [~2o] . They used them, to prove convergence of 
Davidon 1 s first method, steepest descent, and the conjugate gradient 
method in an infinite dimensional real Hilbert space for the problem 
under consideration. 

Lemma 2.1 : If g n -» 6 as n — » °° } then x^ converges in norm to the 

location of the minimum x = - A“ b. 

Proof ; 0 < (x n + A" 1 b,A(x n + A-^b)) = (x^ + A^b,^) 

<l|xh + A -1 bl| • ||g n |b 0 

How jjx^ + A“^blj is bounded for all n, since for all n, x^ is 

contained in a bounded set, namely = convex € H:j(x) < J(xq)^ as 

in chapter 1. Hence, lim (x^ + A“ 1 b,A(x n + A“^b)) = 0 and since A 

n~» oo * 

is strongly positive, we have lim x^ + A“^b = 8. 

n-* oo • 

He can now prove a general convergence theorem for this case. 

Theorem 2.10: If there exist positive reals a, (3 such that 

<xl < < )BI for all n larger than some H and if Oq is chosen 

as in (17) , then lim ||x + A - lb|| = 0, that is, x converges in 
■ , n— > 00 n n 

norm to the location of the minimum. 

Proof: Since for all u e H, mllu|| 2 < (u,Au) < mIuII 2 

— i — < i — < — i — 

M||u || 2 (u,Au) mllull 2 


we have 
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and since ccl!ul| 2 < (u,v( n )u) < pllul! 2 for all n, 

pllull 2 “ (u,y( n )u) “ allull 2 

Since is self-adjoint, we have l(v^ n ^u|l < piluil Qfj. 

Therefore, 


( s k^k ) 2 > ( s k^k ) 2 _ (g^ v(k) gk ) 2 ^ (gk^ (k) §k ) 2 
(s k ,As k ) MllsjJI 2 M||v( k )g k !| 2 ~Mp 2 ||g k || 2 - 


> 


^ (liHkll 2 ) 2 _ 

llskll 

Mp^ 

MP 2 ||g k || 2 

, Therefore, 

itekli 2 ->o 

-b in norm. 
r(°) > a " 1 and 

% is 1 


Corollary 1; If or 

in ( 17 ) } then J^x^) converges to the minimum of j(x), and moreover 
converges in norm to the location of the minimum. 

P roo: f • If V (o) < A" 1 , then hy theorems 2.6 and 2.7 we have 
v(°)- < y( n ) < A” 1 for all n. Hence, M Q I < < - I for all n. 


— m 


2.k Convergence with a More General Choic e of a n 
Let <fa k ^ denote a sequence of real numbers. We then apply 
the algorithm outlined in section 1 using these ' s in step 1 

to minimize the quadratic function discussed in chapter 1, section 3. 
Select a subsequence K = {cc,^ so that j(x*) < for all 



29 


n = 0,1 } 2 } .. . To simplify the notation, let us "write n for 1^. 


Then we have 


gn = g 0 + (Si - go) + (g2 " Si) + • • • + (ga “ Sn_i) 


g n = So + / yi 


since 


yi - Si+l “ gi* 

Then 

n-1 

v( n ) gn = v( n ) go + v( n )y±. (; 

i=0 

From the corollary to theorem 2. if-, V^ n Vi = a,^ = a±. Further, 
from step 5 have x^ = Xq + cx^ and. Xg ~ x-^ + oy = Xq + cTq + cy, 
etc., so that 

n-1 

*n = x o + X 
i=0 

and so on. From equations (l), (21), and (22), we have 
X* = x n , - oV)y^ n )g n 


n-i /. n-1 , 

= X Q + X ff i “ + X N’ 


( 22 ) 
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Hence , 


n-x 

= X 0 - o^v( n )g 0 + (1 - eg Y a r 


Now let ns consider 


ilx* - (- A“ 1 t>)ll = A _1 b + x Q - o^I^g 0 + (l - o^) ^ cr i 


A H + A“^Ax 0 - c^V^^gQ + (1 - c^) ^ dj 


Hence 


** + A^bll = (A” 1 - cc n 'V^ n ^ )g Q + (1 - an) ^ a ± . 


In order to establish convergence, we must show that ||x* + A~"H>!I 
can be made small as n Let = convex e H:j(x) < J(x 0 )} 

as 'in chapter 1. Since it is known that is bounded, £25], we 

can prove the following: 

Lemma 2.2: If n(a,n - 1) -» 0 as n -> <» and there exist a, B > 0 suchth 

n-1 || 

al < V^ 11 ) < pi and 7 n £ - 1 for all n, then (l - c^) V cr^ L» o 


as n oo. 


ttoof: (la^il = ||o i s 1 J| = |j- ajV^g-jjl (by definition). 
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loLjJ II vC i ) (Ax 1 + b)ll < |a i l{^||v( i )|j IIaIIIIx^I + IhK^II * lib 11} 


(25) 


Since xj £ S^. is a bounded set, j|xj_|| is bounded and since 
oy. 1 as i* — * °°, cti is bounded. By hypothesis llv^ll < (3 and 
IUII < M, so everything on the right side of (25) is independent of i 
and. || ff.jj| < L for some L > 0 and all i. Hence, 

< , 1(1 - a^) | * L • n -> 0 



since (a^ - l)n -»0 as n -»«>. 

Lemma 2.5: If g 0 is an element of the smallest closed sub space 
containing the y^’s denoted by S(y^) , if the v( n )'s are uniformly 
bounded, cc^ -* 1 as n -* «o and if 7 n ^ - 1 for all n, then 
^ - OnV^ 11 ^ g 0 jj *-» 0 as n -»co. 

Proof : By hypothesis there exist scalars p k such that 

00 00 00 

S o = X P i y i ^ S * ° A “ lg o = X P i A “ 3y i = X P i ff l ^ C 1 * 12 )* 

i=^D i=0 ■ i=0 


Consider 

lU^go - ct n V'( n )g 0 || 



OO 



1=0 



< 1 - a^l 
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00 
t 1 



00 

V”1 




n-1 

r — i 


Since A^gp = 

I 

P-pcr-L anti 

go 

-x 

Pi^i 

we know 





1=0 


CO 

i=C 




i=0 


is bounded for 

all 

n and 

I 

PiJi 

— > 0 

as n -» oo 


Since cc^ 1, 




i=n 







we know that 

1 - 

c^j 0 . 

Hence, 

A _1 g 0 

- OnV (n) go 

0 as 

n co 


We can now assert the following : 


Theorem 2.11: If g Q e S(y i ), 7 n ^ - 1 for all n, if the v( n ) 
are uniformly hounded, and (ot^ - l)n *-» 0 as n then 

||x* 4- A _1 bl| -» 0 as n 
Proof: By (24) we have 

n-1 

II** + A~ 1 b |j = ^A- 1 - ajrty g Q + (1 - c^) ^ hi 

i=0 

< Ka " 1 - c^vMgJ + 


n-1 


(1 - 


“n) Y, 


i=0 


and hy lemma 2.5 the first term goes to zero. By lemma 2.2 the 
second term goes to zero. 

In this chapter, we have established conditions under which 
two variations of the basic algorithm converge to the location of 
the minimum of a quadratic functional. These are given in theorems 2.1( 
and 2.11. In both of these theorems we are most interested in the 
convergence question for an infinite dimensional Hilbert space. In a 
finite dimensional space of dimension n, we see that for almost any 
collection of c^'s the algorithm converges to the location of the 
minimum in a finite number of steps. The conditions on the ct^’s 
and the proof are given in the following theorem. 
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Theorem 2.12: If 7j £ - 1 and a , ^ ^ 0 for a.l 1 j =0,1,..., 
and if (v(j))- 1 exists for all j, then after at most n + 1 steps 


x' 


■# 


= - A^^b, where n = dim H. 


Proof : First we show that the y^'s form a. linearly independent set 

if r i j Assume that is linearly dependent for some 2. 


/ i=0 

Therefore, there exist scalars p 3 -_ such that 

2-1 


Yl 


r — i 

■ L Piyi 


i=0 


(26) 


By (12) and theorem 2.9 


(a " 1 - V ( ^)y. = r j .j S j=0, 1,2 ,..., I - 1 


Moreover, "by the fundamental property of V 


(j) 


(27) 


(^A -1 - y^ = 9 for i < o 

By operating on (26) by (a " 1 - v( Z )j and applying (27) and (28) 


(28) 


we 


have 


2-1 


• 2-1 


■l = {A 1 - v( Z )) y z = ^ ^(a" 1 - v( Z ))y i = ^ p ± 0. - 0. 


i=0 


i=0 


if are linearly dependent then r^ = 0. Therefore, by 

step 4 of the algorithm is reset to 1 and- by theorem 2.9 
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the resulting x* is the location of the minimum. Hence, the 

, , l 

theorem is true, if are linearly dependent for l < n. 

Since H is finite dimensional of dimension n, we have at most 
n linearly independent y's. Now, if we apply the algorithm n times 
and the resulting r n ^ 8, we have generated n linearly independent 
y's and they must form a basis for E. Moreover, by the fundamental 
property of i.e., theorem 2.k and its corollary, we have 

= A _1 yj_ i=0,l,2, . . . ,n - 1. Since the two linear operators 
' and A agree on the yH s, a basis for the space, it must be 

that 


y(n) _ A -1 


on the whole space'. 


(29) 


Hence, by definition of x*, (29) and (1.10) we have 


** = *n - % v(n) gn - x n ‘ ^ = *n - Vn ' “n A " lb ' <») 

Now from (3) 


= v( n )(g# - g n ) + OnV^gn 


= A _1 (g* - g n ) + o T1 A" 1 g Ili 

by ( 29 ) 

= x* - ^ + o^A" 1 ^ 

by (1.12) 

= x * " *n + « tl A" 1 (Ax ri + ,h) 

by (1.6) 

= x* - Xjj + + cc tl A“ 1 b 



8 


■by (30) 



55 


So lay step 4 of the algorithm cc^ is reset to one and "by 

theorem 2.9 x* is the location of the minimum. 

Many times in this chapter we have proved' results dependent upon 

7 n - 1. We shall continue to do this in subsequent chapters. For 

this reason, we shall investigate the ease of 7 n £ - 1* From. (6) and (7) 

we have - 1 = 7 n = - ( g n-> r p) w hich implies that 
n (g*>*n) 

(y n > r n) = °» 

fill "1 

Wow we know from theorem 2.9 that if (V' ' )” exists and 
(3l) holds because r n = 0 that convergence is achieved on the next 
iteration with = 1. Also if (31) holds "because y r = 0 then 
g* = g n and hy (1.7) then Ax* + b = Ax n + b or x* = x n . But if 
y( n J > 0 this contradicts (l) since g n ^ 0. 

Wow by (5) and (1.17) r n = (V^ - A _1 )y n , hence, (31) can be 
written as 

(,y n ^ v(n) - A-^yhj = 0 (52) 

and, if V > A -1 or < A -1 then (32) is impossible for 

y n 4 9. Theorem 2.6 states that if > A - " 1 or < A -1 then 

v( n ) < A -1 or > A" 1 for all n. 

Moreover, the convergence of the iterates to the location of 
the minimum of a quadratic functional assured by theorem 2.10 and 
its corollary is independent of 7 n , 
should be computed by (17). 


Hence, if 7 n = - 1 then 
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3. COMPARISON WITH OTHER CONJUGATE GRADIENT TECHNIQUES 

If the functional to be minimized is quadratic as discussed in 
chapter 1, then Myers OD and Horwitz and Sarachik 00 have shewn 
that whenever H^° I the DF? technique generates the same search 
directions as those given by the conjugate gradient method. Here, we 
shall examine the relationship between these two methods mentioned 
above and the method discussed in chapter 2 with a n chosen as in 
( 2 . 17 ) assuming that the functional to be minimized, is quadratic.. 

That is, throughout chapter 3 we shall assume that ot^ satisfies 
j(x n + a^) < J(x Tl + As n ) for all real A, and that 
j(x) = J Q + (b,x) + i(x,Ax), as in chapter 1. 

Theorem 3»1 ' If 7 ^ 4 - 1 for all i, then the generated by 

the algorithm outlined in chapter 2 are A conjugate and the 
are A -1 conjugate, i.e., 

(a^,Auj) = (yj_,A y^) = (°j>yi) = 0 if i 5^ j, (l) 

0 < i < k 

if 0 < k < i, (2) 


(g k ;Si) = < 


0 


if 




(giJ»Si) 


and also 


V 


(k) 


Aai = 


holds for all 


i < k. 


(3) 
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Proof : (By mathematical induction) By 2.4 r n =■ V^ n Vn - 

so that 


*n = v(n) y n - r n 


00 


where cr n = cx^s,^. 

By (1.12) Ao q = y Q , so that 


V (1) Act 0 = V (l) y 0 


(1) (o) (*0^0*0 

'Aa 0 = V ° y 0 - 7 — ■ 

( r o^o) 


(by (2.9) and 
theorem 2.5) 


= cr 


(by (4) with 
n = 0) 


Hence, (or 0 ,Aax) = (a o ,A(-a 1 v('0g 1 ) = -ctQ^V^Acr^gx) since A and 
are self-adjoint. Therefore, (a 0 ,Aax) = -a]_(d 0 ,g 3 _) since 
V^Aa 0 = a 0 . Hence, (u 0 ,Acrj) =» -a^.O since, ocq_ was chosen to be 
the minimum in the direction s n , (,a of gj) - '0 by (2.18). Hence ^ the 
theorem is true for k = 1 . We shall now assume that ( cr..± , Affi ) = 0 
if 0 < j < i^k and V^Aa ± = a ± if 0 < i < k. By ,(1.7) and (2.1), 


g k = b + Ax fe = b + A(x k _ x + a^) 


- b + A ( x i + x + h i+1 + .. . . + a v _ n ) 


k-l J 


+ . • . + Aoi 


k-1* 


= &L+1 + Ap i+1 
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Therefore, 

(^i-^k) = ( a ±> ®i+l ) + ^ a i-> Aa i+l^ + ••• + ( a i^ Ac t-l^ 
— 0+0 ••• + 0=0 


that is , 

(ffi^sk) = 0 (5) 

* 

hy choice of since (^i^Si+i) = 0, and the other terms are zero 

hy the induction hypothesis. So we have established the first part 

» t * 

of (2) for i < k. 

Wow for i < k we can see that 

(cr^Aa^) = (a x , - o^AV^ k ^g k ) (by (2.1)) 

(ct^Ao^.) = a k (V*' k ^Aa i ,g k ) (since A and 

are self-adjoint) 

(Ci,A<y k ) = - cc^a.^) = 0 (6) 


by the induction hypothesis and (5)* For a quadratic functional, 

A Of = y^ by (1.12), hence by substitution into (6) we have proved (l). 
We consider for i < k 


v (k+D AcJi = v^Act-l - (by def. of v( k+1 )) 

( r k ,y k ) 


v (k+l) Ac ^ = _ 


(v^ k) y k - d k ,Aa ± )r k 

(ik#yk) 


(by def. of r k ) 



59 


(v (k) y^yi)^ 

1 " (Wk) 
(y k ,v (k) yi) r k 


( r k^k) 


(Agk^x) r k . 
(r k ,y k ) 1 


(since (cr^AcrjJ = o) 


(by theorem 2.l) 


(by the corollary to 
theorem 2.4) 


(since Aa k = y k and 
(Ad k , c^) = 0 for i < k) 


Moreover, by ( 2 . 9 ) and (4) 


V< k+1) A % = ¥< k+1) yk 



( r k ^y k ) 



j k . 


Hence we have established 1, 2, and 3, and the first half of 4. We 
know that x^ = x k+ i ~ ff k and hence, x k = x i ” ff i«i - * • ♦ ~ Cf k for 
i > k. Then g k = Ax k + b = Ax^ + b - A(cf^ + . . . + cr k ) . Hence, 


g k = Sf " + ••• + \)> s< 

i-k 

(g k ,s ± ) = ( gi , Si ) -i l (Aff ± .j,Ci) 


1 1=0 
i-k 


=( %' s i) - b Y_, ° = (s i^ S i ) 
1 0=0 


for k < i. 


We see from the preceeding theorem that this method is a conjugate 
direction method. In light of the remarks at the beginning of this 
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chapter, the question arises as to how our method is related to the 

conjugate gradient and DEP techniques. Since our method is a conjugate 

direction method we must have, if 7' n / 1 for all n, that the d n ' s 

are linearly independent . For if the / crA ^ _ are linearly dependent 

1 J n-3 

then there exist scalars such that 


V P i cr i = e • 


L . 

i-0 


(7) 


So if j < l we have from ( 7 ) that 0 = ^ (cr^, Acr^), which implies 

i =0 

that Pj(aj,Adj) = 0. Hence, j3^ = 0 since dj ^ 0 and since A 

is strongly positive. Since d n = a> n s n the s n ' s are linearly 
independent . 

Notice also that v(°)g 0 = v(°) • 1 • g Q . If we choose c 0O = 1 
then v(°)g 0 = v(°)^c 00 g o ^ and 


V' 


(l) Si = V ^°W - (n;;yo ) ¥(0)(S l " a o)So) 


v(D gl - v(o) (L _ + 


( r 0 ^o)/ Sl + ( r o^o) ' S ° 


that is, V^ 1 ^g 1 = V c ii^§i for scalars c Ql = - 1 a o^ r o^ s l^ 


and cn = 1 - 


( r o^l) 


( r o^o) 


The above suggests that for every n there 


" " ( r o^o)* 

exist scalars ci n , i = 0,. 1, ... , n, such that 


n 


V^gn = T<°) £ 
i =0 


c inSi 


(8) 



We shall now establish (8) and find a convenient way to express the 


c in' s * 

Theorem g.2 ; If 7j £ - 1 for j = 0,1,2,..., then for every integer 
k, there exist scalars a ik , b ik , i = 0,1, ...,k such that 


v( k Vk = 


k-1 


y aityi + i bikgi ' 

i=0 i=0 


and 


(k) (o) V 1 

v Sk = ^ c ikSi 

i=0 


where y k S k+ i ~ S k * 


(9) 


(io) 


Proof: (By mathematical induction) 


V (°)y 0 = v(°)(l )y Q , 


so 


a oo — 


V (Dy = V (°)y . 

1 ( r „»y 0 ) 


by (2.10) 


= V^°Vi - v(°)y 0 + OoV^^g 


= v 


(o) 


1 0 


by (2.5) 


£ | ^.l^i + ^ ^ilSi 
i=0 i=0 


„ , <Wl) ( V y l ) 

where a-Q - 1, a 0l * - - — and b 01 = - a j- r-. Moreover, 

(v y 0 ) (Wo) 



where 


1 , 

v(i 'W = Yj Ciisi 

i=0 


-11 


1 - ( r o-» g l ) 


(l - a- 0 )( r 0 ^g 1 ) 

> c 01 = — — 

, ( r o>yo) 

i ' 


(^OJ^o) 

as shown in the previous paragraph. The induction assumption is: 
there exist a^, b^, and e^, i = 0,1,2, ...,k j = 0,1,2, ...,i. 
Such that. 


V^Vi = v(°) 


i-1 


a ji^ + / . 15 ji g j 


J= 0 j=0 


J 


(H) 


and 


v (l) gi = 7 (o) 


Since 


J- 

l °ji g j 


j=0 


vt k+1 >y k+ i = v( k )y k+1 


%^(v( k >y k ♦ 


( 12 ) 


(by (2.- 5) and 

(2.10)) 


k-1 




w ° “ l - * v 1 " 1 * 1 


(by (2.9)) 
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Therefore, ( 9 ) is established for k + 1, if (ll) and (12) hold 


for k. 



kk 

Also 

v (k+!) gk+ l _ v( k+1 )y^ + v( k+1 )g k (Since y k = g k+1 - g k ) 


= v( k ) 


y k - v< k V k + <%v< k) gj+ V (k+1) e k (Uy (2.10)) 


g (k+1 Wi - ^ k > gjc + 


How let us consider 


V (k+l) g k = v( k) g k - |p - ^ k ^ V (k) y k + a k V (k) g k j (by (2.10)) 



Using y k = in (it) and substituting that back into (13) 

and applying (l2)_, we have 


v( k+1) g k+ i 





^5 


Hence, (10) is established for k + 1, and the theorem is provided. 

In order to establish the relationship between the three conjugate 
direction methods we wish to find an expression for Cj_ k in terms of 
the gi's and v(°). Prom (2), (g^s-j.) = 0 if i <"k. Hence. 

-(Sk; V^^gi) = 0 if i < k. From (8) we have 



i 


£ c u (g k y o) g ? ) = 0: 
1=0 


( 15 ) 


Let us fix k > 1 and notice that if I - ‘0, and since (15) 

-s 0 = V ( ^g 0 = V°(l) go , we have c QO = 1. Hence by (15) with i = 0 

we have that (gk^ 0 ^g 0 ) = Lut- this is also true from (2), since 

V°g 0 -- s 0 = - c Q . We consider (15) with i = 1 and have 

o 


0 ~ c 01^k^ v ^°^So^ + c ll^k^ v ^°^l) ■ c ll(Sk^ v ^°^l) 

since (g k ^V°g 0 ) = 0 by (l). How if c-^ = 0, then s-l = - c 01 V^°^g o 
= c 01 s o^ we observed before that s OJ sj_ are linearly independent. 
So it must be true that (g^V^ 0 ^gj_) = 0. Moreover from (8), we have 

- -si - coisq + cuV^gx, so V^g-j. e S^sp). 

By induction 

^ ^Sl> ■ • *^^^ 0 ^Sn-l 6 S(s OJ> s-p . . .,s n _]_) (l6) 

where S(s 0 ,sx, . . .,s n „x) denotes the subspace spanned by the s^'s. 

Let us assume that (g^V^gj) = 0 for all l = 1,2,... ,n - 1 for 
n < k. By (2) and the induction hypothesis we have 
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n 


0 " (g ^ v(n)g n ) = ^ c ln^k^ ( ° )g l ) “ c nn^ v( '° )g n ) * 


2=0 


n-1 


If c nT1 = 0 we must have from (9) that -s n = V'°^c Zn g z so that 

2=0 

Q , s^, . ...s^p) "by (l6) . But this implies that 

/- -v u ' ' 

\^±j is a linearly dependent set of vectors which-' contradicts the 

remarks following theorem 3.1. Hence, (g-^V^gj) = 0 for all 
l - 0,1,.. . .,k - 1 and we have from (2), if 0 < 2 < i, that 

(V^ 1 ^g i ,g z ) = - (gj,s i ) = (g i ,V^ 1 ^g i ). Therefore 


(V (l) gi ,g z ) = Y c d .(gj,V (o) g 2 ) 
0=0 

and for all j f 1 (gjjV^g^) = 0. So we have 


= c u ( gl , V ^ g z ) 


(o) f 


Hence, c i± ~ 


(gpv(°)g 2 ) 


which implies that - sf = 2T 'g± 


(i). 


(o) V ( g ^ v ^ W ) 

' ) — ^ g i . Therefore 


2=0 


(g p v(°) gz ) 
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- s t = (g i ,V (i) g 1 )V (o) Y 


s , 


1=0 


(gpV (o) gj) 


(17) 


Hence, we can state the following theorem. 

Theorem g.3 : If 7 n 4 - 1 for all n, is chosen as in (2.17) and 

V ^ 0 ' 1 = H*' 0 ^ of the DFP method, then the search directions of the DFP 
and the Davidon-Broyden method with chosen by (2.17) are the same. 

Moreover, if V^°^=H^°^=I, then these search directions are the 
same as those of the conjugate gradient method. 

Proof: Horwitz and Sarachik [ 20 ] have shown that for the DFP method, 

the ith search direction is given by 


H 


(0) 




S l 


1 = 0 


(g 2 ,H(°)g 2 ) 


If h(°) = V^°) it follows from ( 17 ) that the directions are the same. 
In [l9l it was shown that for the method of conjugate gradients, the 
ith search direction is 


i 



At each point x n the three methods generate a direction s n 
then the stepsize is chosen so that the function Jtx^ + As ) is 
minimized with respect to A. Since the directions are the same and 
the stepsize is chosen in the same fashion for each method, the 
sequences of iterates generated by these methods x Q ,, x-j_, x g , ..., 
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will be the same. Again, we restate that throughout chapter 3, the 
functional to be minimized is quadratic as outlined in chapter 1. 

It is well known O] that the rate of convergence to the minimum 
of a quadratic functional for the method of steepest descent is given 

*>y 

(J(xf) - Jt-A" 2 !)) (J(x 0 ) - j(-A"’ 1 b) ) ,i = 1,2/... (18) 

where m and M are given by ( 1 . 3 ). Daniel JjjTf has established that 
the rate of convergence for the conjugate gradient algorithm is given 
8y 

(J(x ± ) - Jt-A-^)) < ^ 

( 17 ) is obviously a faster rate of convergence than (l8). 

Under the conditions of theorem 3.3 with = I, we know that 

the iterates generated by our algorithm and those of the method of 
conjugate gradients are the same. Hence, the rate of convergence of 
our algorithm to the minimum is given by (l9) and we have the following 
theorem: 

Theorem J.k : If for each n, is chosen by (2.17)^ 7 n / - 1 and 

V (o) = I, then the rate of convergence for the algorithm outlined in 
chapter 2 is given by ( 19 ) . 



'( j(x Q ) - j(-A _:t b.)), i= 1,2,. (19) 
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4. EXTENSION OF POWELL'S IDEA 

In this chapter we shall extend an idea of Powell QjcTj , concerning 
the basic algorithm as outlined in chapter 2/ to a separable infinite 
dimensional Hilbert space. The idea is to use the rank one algorithm of 
chapter 2, but with search directions which are independent of the 

gradient. Specifically, we wish to compute the location of the 
minimum of a differentiable functional J:H R. We let be a 

strongly positive, self-adjoint, bounded linear operator, as in 
chapter 2, and let x Q e H be the initial estimate of the location 
of the minimum. Further, let p be an arbitrary integer. If the 
dimension of H is finite, it is advantageous to let p = fl-im H. 

Let X=h C H represent a basis for H. Compute «l(x 0 ) and 
g 0 , and proceed as follows. 

Step 1: Let 


x* - x n + cr n . 


( 1 ) 


and compute j(x*) and g*. If j|g*|| = o then x* satisfies the 
necessary condition for a minimum, and we stop. Otherwise, 

Step 2: Compute the residual vector as in chapter 2. Let . 
r n = v( n V n ~ cr n where y n = g* - g n and compute the scalars 


^n “ (§*> r n) 




7n = 


(gn> r ri) 



if % + - 1 

if = - 1 
n j 


( 2 ) 



-otherwise let 
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Step 5: If v( n )y n = d n , let v( n+1 ) = v( n ) , 

y(n+l) _ y(n) + ^ ;g(n) 


( 5 ) 


n 


where B^ n ^:H -»H is defined such that for all x e H 


B( n )x = (x,r n )r n . (4) 

Step 4: If J(x*) < J^x^), let x^^ " x *« Otherwise, let 

x n+ i = x n . If n = pk for some integer k, then let 

= x o _ y(n) go . ( 5) 

Evaluate j(z k ) and g(z k ) and if ||g(z k )|l = 0 stop. Otherwise, 
return to step 1. 

We shall show that z k converges in norm as k «■ to the 

location of the minimum of a quadratic ’functional. For an infinite 

dimensional Hilbert space, we determine the frequency with which we 

apply the Newton-like iteration z k = ,xq - V^ k ^g 0 'with pk = n. 

$ < 

With this modification of the basic algorithm, we can prove many 

* * 

theorems which are analogous to those of chapter 2. Henceforth, as 
in chapter 2, we shall assume that the functional to be minimized is 
quadratic. That is. 


J(x) 


= J 0 + (b,x) 


+ i(x,Ax) 
2 


(6) 
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where A is as in (1.3). Theorems 4.1, 4.2 and 4.4 are independent 
of the type of functional being minimized. 

Theorem 4.1: as defined in (4), is a self-adjoint positive 

operator for all n. 

Proof : As in chapter 2. 

Theorem 4.2 : is self-adjoint for all n. 

Proof : As in chapter 2. 

With the next two theorems we see that the properties of v( n ) 
given in theorems 2-3, 2.4, and their corollaries hold even though 
cfn is a prescribed vector independent of y( n ) , a^, and g n . 

Theorem 4.3: If A _1 u = V^ n ^u for some u e H and B:H H is 

such that there exists some scalar p, such that B - = pB^ n ^ 

then A~lu = Bu . 

Proof: By (1.12) we know that A - y n = x* - x n = cr n and by def . 

r n = - <? n = (vC n ) - A _1 )y n . If - A -1 )u = 9, then 

(r n ,u) = ((V n ) - A _1 )y n ,u) =(y n ,(y( n ) - A -1 )u) = (y n ,8) = 0. Hence, 

if B - v( n ) = pB( n ), ^B - v( n ))u = |i(r n ,u)r n = p. • 0 • r n = 0 
Since, by hypothesis V^u = A _1 u, we have Bu = A _1 u. 

Corollary: If V^u = A~\l then y( n+1 )u. = A -1 u. 

Theorem 4.4 : V^ n+1 ^y n = oh, if 7 n 4- - 1* 

Proof: If v( n )y n = O n , then v( n+1 ) = v( n ) by step 3 and the 
theorem is obvious. Otherwise, using (5) and (6) we have 

v (m-l)y n _ an = v (n)y n + ^n . r X) (r n ,y n )r n - a n . 

M n 
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Hence., using (l), (2), and ( 4 •) 



Corollary: V^y^ = dq for i < n, if y ± / - 1. 

theorem 4.5: If 7 n ± - 1 then (A n - 1)A% » - (r n ,y n ) _;L . 

Proof : Formally the same as the proof of the corresponding theorem 

in chapter 2 in spite of the change in the definition of cr n . 

Theorem 4 . 6 : If ?(°) > A" 1 , then v(°) ^V^ 1 ),..., > v( n ) >, . .., > A -1 

and similarly, if y(°) < A" 1 , then v(°) < V^ 1 ) < , ..., < y( n ) < < A" 1 . 

Proof : Formally follows the proofs of theorems 2.6 and 2.7 and is based 

on theorem 4.5, A -1 y n = cr n , that is, (1.12) and the Schwarz inequa- 
lity [¥]. If x e H and v(°) > A" 1 and ?( n ) > A" 1 , then by (2.10), 

(2.15), and the Schwarz inequality 




Also from (2.10) (x,(v( n+1 )~ y( n )jx)= - < 0. 

(y n >(v (n) - A -1 )y n ) ~ 

We now wish to establish a convergence theorem for this- modifi- 
cation of the basic algorithm. Since the set ^ is a basis for H, 
for each x e h, there exist scalars c^ e R, i = 0,1, ... such that 


00 



i=0 


( 7 ) 
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CO 

°r c i& a ± * Since it is known that Acr i = y± (1.12) 

i=0 

yi = g* - gi, we have 

03 

x = ^ c iyi- 
i=0 


Then 

y( n )x = 

i=0 



By the' corollary to theorem 4.4, if 7j f - 1, j = 0,1,2, . . 
have v( n ^y^ = Cj_ for all i < n. So (9) becomes 

n-1 co 

y( n )x = ^ c iGl + v( n ) c^. 
i=0 • i=n 

Therefore, hy (7) and (10) we have 


|'A“^x - v( n )xll ta 


n-1 

Cihi - ^ Cj_cri’ - v( n )- 

i=0 i=0 


i=n 


c iyi 



where 


(8) 


(9) 

n, we 


(10) 


( 11 ) 



54 


If the are- uniformly hounded, then IjA" 1 - y( n )[| is hounded. 


By (8) 


I 


i=n 


c iJi: 


0 as n -4 co for this is the nth remainder 


of the expansion of x in terms of the y^ ' s. Therefore, we have 
||a _1 x - V^ n ^xll -> 0 as n -»«>. Hence, we hafe the following: 

Theorem 4.1: If are uniformly hounded and 7 n ^ - 1 for 

all n then ->A“^ pointwise. 

Corollary; If z k = x Q - V^gQ where pk = n, then converges 

to the location of the minimum as k -*oo. 

Proof : In chapter 1 it was shown that the location of the minimum of 

the quadratic functional is -A -1 b . Hence 


z k + A ~ lb 


= Ik - V< n k + A" 1 ^ 

k " V (n) (Ax o + h) + A _1 b|| 


< Xq - V^Axo 


A _1 h - v( n ^b 


( 12 ) 


(By (l.T)) 


By theorem 4.1 V^ n ^(Ax 0 ) ->A -1 (Ax 0 ) = x Q and V^h *-» A'^b . 

Hence, z ^ -» -A b as k- 

The above theorem and its corollary establish the convergence 
to the location of the minimum of the quadratic functional for this 
modification of the algorithm. As noted earlier, the search directions 
here are prescribed and are independent of a^, g n , and v( n ) . The 
rate of convergence could perhaps he improved by letting 
z k = % “ where pk = n . 


% 



55 


Notice, if H is finite dimensional then z-j_ is the location 
of the minimum. This follows since hy theorems k-.$ and t A and their 
corollaries A -1 and V^) agree on a basis for 

H. Hence = A" 1 . {Therefore, 

Z 1 = x o " (hy definition) 

= x o - A ' lg o 
= X 0 - A _1 (AX 0 + b) 

= - k\ (by 1.7)) 

and hy theorem 1.2 -A”^b is the location of the minimum of the 

quadratic functional J defined in chapter 1 . {phis is the. idea due 
to Powell as mentioned in the opening sentence of this chapter. 
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‘5. CONSTRAINTS 

In this chapter we shall consider the problem of computing the 
location of the minimum of a differentiable functional defined on a 
real Hilbert space, H, subject to linear equality constraints. It is 
shown how this problem can be attached by a modification of the r ank 
one, quasi-Newton algorithm outlined in section 1 of chapter 2. 

5.1 Minimization on a Closed Linear Subspace 
We shall assume that J:H R is a differentiable functional 
and that D is a closed linear subspace of H. We wish to find 
x e D such that j(x) < j(x) for all x e D. Let D* denote the 
orthogonal complement of D so that H = D © D*. Then for any 
x e H there exist unique x D e D and x-q* e D* such that 
x = x D + Xj-,*. Therefore, we can define an operator 


P:H -» D (1) 

such that P(x) = x^ for each x e H. P is called the projection 
operator of H onto D. It is known £lj that P is linear, self- 
adjoint, bounded and 



Moreover, by (2), for all z e H, 


( 2 ) 


(z,Pz) = (z,P 2 z) = (Px,Pz) 



2 


(3) 



Lemma 5.1 : If we apply the basic algorithm outlined in chapter 2 

with v(°) = P, the projection operator defined in (l), then 
v (k) = v (o) y (k) v (o) ^ v (o^ k = rk f 0r all k, where 

r k v(k) (y k + «kSk)* 

Proof: (By mathematical induction) since v(°) = P we have 

v(°) . v(°) = v(°) 


Hence, 


V (o) ri = V(°)(v(°)( yi + ct, 0 g 0 ) ) (by (2.3)) 


= v°(yi - a 0 go) 


(by (it-)) 


= r i C by (2.3)) 

Also, v(°)(v(°))v(°) = v(°) by (4). Hence, the theorem is true for 


k = 0. Assume that 


y(°)y(k)y(o) _ y(k) 


By applying (2.3), ( 3 ), and (k), we have 


Y ( 0 ) r k = Y ( 0 ) (Y (k) ( yk -t M^))' 

= y(°)v(°)v( l£ ) . (V^(y k + a k g k )) 

= V (°) v ('k) V (°)(y k + a k g k ) 


v (k) (y k + a k g k .) 


(6) 
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If = v^k+l) fkg theorem is true. Otherwise 

v (k+l) = v (k) + ■ ^ k ^ -.. 1 - ) .r k>< r k (by (2.9)) (7) 

where the operator given in (2.10) is written in dyadic 

notation {jL2j . Hence, 

y(o)y(k+l)y(o) = y(o)y(k) v (o) + ^ ^ V ( ° )j, > < y ( O ) _ (Q) 

‘ . ! Pk *• ' K 

By applying (5) and (6) to the right hand side of (8) we have 

l 

y(o) v (k+l) v (o) = yk - + >< r = y( k+1 ). (hy (2.9)) 

Pk 

Lemma 5-2 : If = P, the projection operator defined in (l), 

then for any z e H, we have v^°V^z = V^ k V^°^z - V^ z . 

Proof : By lemma 5*1 and (^), we have for any z e H 

y(0)y(k) z = y(0) ( y(0)y(k)y( 0 ) )z = y( O ) y ( k ) y( O ) g = y^,. ‘ 

Notice that the proof of the two lemmas above required only that 
y(o) . y(o) _ y(o) t 

Theorem 5-1 - If the initial estimate x Q of the location of the con- 
strained minimum of J is an element of D and = P, the pro- 

jection operator on D defined in (l), then the iterates 
x l> x 2> * * ’ •» x n‘ • • generated hy the basic algorithm outlined in section 1 
of chapter 2 are all elements of D. 

Proof : (By mathematical induction) since the x 1 generated by the 

basic algorithm is either x Q or x* = x Q - a o V^°^g 0 by (2.1) and 


x D e D by hypothesis, we only need to show that x* 1 D in order 
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to establish the theorem for k = 1. But, since is the pro- 

jection operator onto D, we have V^°^go e D and since D is a 
subspace, we obtain x Q - ctoV^go £ D for any Oq e R. 

Assume x k e D and consider x* = x k ~a k v( k ^g k = x k - 
= x k - a k v( o) (V^g k ) ( lemma 5*2). Because 7^°) is the projection 
operator, V°(V^ k ^g k ) e D. Hence, x* = Xfc - a k V^gk S D for all 
a k £ R. 

Notice that theorem 5-1 and the above lemmas are independent of 
the manner of choosing a k and the functional J is only required 
to be differentiable. Further, notice that the theorem and lemmas 
hold if, in (2.9) v( n+1 ) = + pB^ for any real number p. 

Now suppose that the functional to be minimized is quadratic as 
discussed in chapter 1. The problem is, therefore, to find the 
location of the minimum value of j(x) = J 0 + (b,x) + l/2(x,Ax)- for 
all x e D, a closed linear subspace of H. Now if P denotes the 
projection mapping of H onto D and we denote I - P by C, 

C is bounded, and the problem becomes to minimize J. subject to 
Cx = 0. Notice the null space of C is exactly D. If we make the 
substitution x = y - A"^b, then 

J(x) = J Q + (-A -1 b + y,b) + |(-A -;L b + y,A(-A _1 b + y) ) 

* J 0 - (A _1 b,b) + (y,b) + •■- - -- g -— -- + J(y,A(-A _1 b) ) 

+ ^(- A ’* lb J A y) + 

= J Q - |(A- 1 b,b) + |(y,Ay) 
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J(x) = Jq - 7 j{A“ 1 b,b) + | J(y) 

where j(y) = (y,Ay) . If Cx = 8 then C( -A”^b + y) = 9 or 
Cy = CA - 'Hd . If we let CA”^b = d then minimizing j(x) subject 
to Cx => 0 is equivalent to minimizing J(y) subject to Cy = d. 

We shall examine the' problem .of minimizing J(y) subject Cy = d 
and then see what this tells us about the original problem, that is, 
to minimize. j(x) subject to - Cx = 0. 

We shall define a functional ( , )^:H X H -> R as (x,y) 1 - (x,Ay) 
for all x,y e H. Notice that for any x e H, (x,x)p=(x,Ax) > m 
(1.5) so that if x/ 8,(x,x)-j_ > 0 and (x,x)^ = 0 if and only if 

- 1 ; * - v 

x = 0. Moreover', the ’inner product ( , ) is linear in the first 
term by definition, hence, we know that, the function ( , is 
linear in the first term. , Moreover, since A = A* ( 1 . 3 ) for every 
x,y e H, we have 

foyjj. = (x,Ay) = (Ay,x) = (y,A*x) 

= (y,Ax) = (y,x) 2 . 

That is, ( , )-]_ is symmetric. Hence, ( , )^_ is an inner product 
on the linear space H. We shall denote the space (H, ( , )j) "by H 1 • 
We can see that H’ is complete as follows: suppose that 

(jCp - x n ,A(xp - x n )) 0 as p,n «>, Then, since for any p,n 
(xp - x n ,A(xp - x n ) ) > m(xp - x n ,Xp - x n ) > 0 by ( 1 . 3 ), 

(xp - x n ,Xp -’x n ) -» 0 and by the completeness of H there exists 
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an x e H such that (x n - x,x n — x) -» 0 as n -» °°. Since "by (l.3),> 
M(x n - x^Xjj^, -x) > (x n -x,A(x n - x)) -* 0, we have (x n - x,A(x n - x) ) -* 0, 
Hence/; H f is complete. Therefore, H' is a Hilbert space. 

Now if we denote by M the closed linear eubspace of H* which 
is the null space of C and if y e H* is such that Cy = d then 

rw rsj 

the linear variety which satisfies Cy = d is given by V = y + M. 

By the projection theorem 0 ] there is a unique vector y Q in V 
of minimum norm with respect to the H* norm. Further, j Q is 
characterized by the fact that y Q is the only element of V 
orthogonal to M with respect to the ( , ) inner product. 

This means that 

(yo^yo) 1 < (y^y) x (9) 

for all y e V = y + m, that is for all y such that Cy = d. 

Moreover, for every y such that Cy = 9, y Q is characterized by 
the fact that 

(y,y 0 )i = °* (10) 

That is, in terms of the ' definition of ( , ) we have from ( 9 ) 
and (10) 

(y 0 ; A yo) < (y^ A y) (u) 

for all y such that Cy = d, and 


(y,Ay 0 ) = 0 


( 12 ) 
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for all y such that Cy = 6. Hence, the solution to the problem of 
finding the minimum of j(y) is characterized by (ll) and (12). In 
terms of the original problem of minimizing j(x) subject to Cx = 0, 
this, means that the problem has a unique solution x = -A""\> + y Q 
and if x satisfies Cx = 0 then by (12) 

{x,A(x + A”^b ) ) = 0. (13) 

But by (1.7) i£c + b = g(x). Hence, we have that at x, (x,g(x)) = 0 
for all x such that Cx = 8. That is, g(x 0 ) is orthogonal to the 
null space of C which is D. In other words the projection of 
g.(x) onto D is zero. 

How let us use the modified rank one algorithm to locate x. 
Suppose that the scalar is chosen so that 

t ctpS^) < ^ s n^ 

for all A s R, that is, is chosen by (2.17). Therefore, the 
value of gc^ is given by (2.19). He apply the modified basic 
algorithm as discussed in this’ chapter with the initial estimate 
x Q e D and = P as defined by (l). 

We shall now establish conditions which will guarantee that the 

projection onto D of’ the gradient at the iterates tends to zero. 

1 , 

As shown above, this is a 1 necessary and sufficient condition for a 
minimum. 

By (2.15), we have (y n ,r n ) = (y n ,(V^ - A" 1 )y n ) _1 . 

(1.12) and (2.5) we have 


Then from 
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(y^ A ^n+l ” °"el) 

= (&n+l> a n) ‘ , te a * ff n ) - 

By the choice of we know that (g n+ j. ? cr n ) = 0. Hence, by the 

definition of cr n we have 

(V A_ly n) = a n^ g n^ v(n) gn)* (13) 

Also by (2.5) 

(y n M n) y n ) - (g n+ i' v(n, Sn+l) - +1> " + (Sn^W 

(14) 

Therefore, by theorem 2.2 and (2.18) (g n , V^g = 0 aad 
(§n+l-> V ^ n W) = °* Hence, (lif-) becomes 

(jn^Vn) = (gn+l.^ n+l) g n+1 ) + (g n ,V^ n) g n ) . (15) 

Hence, (y^ (T (n) - A J )y n ) - (g n+1 ,V (n) g n+1 ) + (l - %) (g n ,V (n) g n ). 
Therefore we can say: 

Lemma 5»5 * If is a positive operator on D and oc^ < 1 

then (y n ,(v (n) -A” 1 )y^ ) = (y n ,r n ) >0. 

Lemma ^>.k : If V^ 0 ) is the projection operator onto D and the 

/•s} 

V' ‘ are positive uniformly bounded linear operators on D with 
bound K > 0, then 




6 h 


Proof ; Define ( , )i :H X H -» R such that (x,y)^ = (x,V^V) for 
all x,y e H. By lemma 5.1 V^ 1 ) = V^V^V 0 ), so if x e H then 


v(°) x e D, hence, (x,x)^ = (x,v'' x; x) = ( (V^^x), V' 1 ’ w (v'^x) ) >0 since 
v(^ is positive on 'D. Therefore, the Schwarz inequality holds 
for each i, that is, (x,y)^ < (x,x)^(y,y)^ Qfj . Hence, 


r(i) 


(°)„\ MtrXo) 


V(l) % 


= (V^g-Z^g..) 2 = (v^ 1 ^g i ,g i ) i 


< (V (i) Si.V (i) g i ) 1 .* (gi,gi)i 

= (V^ 1 ^gi,v( 1 ^(v( :L )g 1 )) . (gj^V^gi) 


< K 


v ( \ 


' 2 (gi,v (l) gi ). 


Therefore, if 


V^ gi ^ 0 we have K(g i ,V^ i '*g i } > 


Hence, ( gi ,V (l) gi ) > 


V (O gi 2 


v( i ) gi 


K 


By our choice of we know that ( 2 . 20 ) holds. Hence, 

_ ( s n>Sn) 
lim r = 0 


n -» oo(s n ,As n ) 


Since 


(s^gi) 2 (g i ,v( 1 )g i ) £ 


(s ,Asj_) ■ M V(i)gi 

V (l) Si| 
KKM 


(by (1.5) and 

( 2 . 2 )) 


far (16)) 


( 17 ) 
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we have "by (ll) 



v (i) si 


— > 0 as 


and 


V (i) gi I -> 0 


(gi,v (i) gi ) 0 


oo. Moreover, since by (12) 
as i -» co, we obtain 
as i -> oo 


(18) 


If (r i,Yi) 4 0 we iia-ve in view of (2.14) and (2.10) for any x e H, 


y(i) x = y(°) x 



( r Z;*) r i 


( 19 ) 


Hence , 


foV^x) 



( 20 ) 


Recall from step 2 of the basic algorithm, that if (r .,y.) = 0, for 

some j, then so that the tern containing (r-^y^) 

J J 

in the Siam given in (19) or (20) is not present. We shall assume 
that if (r^y^) = 0 for some j we have not included that term in 
the sum in (19) or (20). Recall that from lemma 5.3, if < 1 

for l = 0,1,2 ..., i - 1 then (/^y^) > 0. Hence we have 

(x,?^ 1 ^) > (x, V^°^x) 


= (x,V^°V^°^x) since = y(°)y( 0 ) 

= (V^XjV^x) since (y(°)) x = y(°) 


( 21 ) 


Prom (21) with x = gi we have the following. 
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Theorem 5.2 : If cq < 1 for all i and the are uniformly 

hounded positive operators on D, then 0 as i -> <». 

Proof : If cti < 1 for all i , then hy (21) with x = we have 

(g^V^g^) > V^gq 2 >0 and -* 0 as i -> «> hy (l8) 

Hence, J V^gq 2 -» 0. 

We have now established conditions which guarantee that the pro- 
jection on D of the gradient of the quadratic functional evaluated 
at the iterates tends to zero. Notice that if M as defined in (1.4) 
is such that M < 1, then since P < 1 OO . 



that is, P < A~^. Since = P, < A ^ we have hy theorem 2.6 

that v( n ) < A~- for every n. Hence the v( n ) are uni for mly hounded. 

5 .2 Linear Equality Constraints of the Type Cx = u> 

Suppose the problem is to compute the location of the mi nimum of 
a differentiable function J:H-»R, with gradient g:H -> H, subject 
to the constraint that Cx = <x>, where C is a bounded linear operator 
from H into H, where H is another Hilbert space, and . is a 
■fixed element of H. That is, we wish to find x e H such that 

A J AJ 

Cx = cn and J(x) > J(x) for all x e H such that Cx = ni. With 
a slight modification, we can apply the basic algorithm out lin ed in 
chapter 2 to this problem. Moreover, we can show that the sequence 
of iterates xq, X 2 , • • . ,x n , . . □ generated by this modification is 
such that for each k, Cx k = to. 
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Let v(°) te the projection operator of H onto the null space 
% 

of C (a closed sutspace of rH, since C is hounded). Let x Q te 
such that Cx c = cd (22) and apply the algorithm. Now, x-^ = x Q or 
x* where x* = x Q - a 0 V^°^g Q . Consider 

■Cx* = 0x o - C(a 0 V (o) g 0 ) 

= a> - a 0 C(V^ 0; ^g 0 ) (22) 

where V°g o is in the null space of C ty the choice of V^°^. 

Hence, Cx* = cd for all ct 0 € R. Therefore, Cx^ = cd in either case. 

Since either x n+ i — x n or x n+1 - x n - a n V^ n ^g n we know that 
if Cx n = cd and x n+ ]_ = x n then Cx n+1 = cd. Otherwise, we consider 
Cx n+ i = Cx n - Ca 11 V^ n ^g n . Since the proof of lemma 5*2 depended only 
upon the fact that and we know that this is true 

for the projection operator onto the null space of C, we have that 
v( n )g n = v(°M n) g n . Hence, V^ n ^g n is in the null space of C. 
Therefore, C(ct rl V^ n ' l g n ) = 0. Hence, Cx n+ i = Cx n = CD. Therefore, 
by mathematical induction we have established the following theorem: 
Theorem 5 * 5 * If is the projection operator and the null space 

of C and is defined as in (2. .10) and Cx Q = cd^ then Cx n = <d 

for all n where the x n T s are the iterates generated by the algorithm 
outlined in chapter 2. 

We shall now show that the problem considered in section 1 of 
this chapter is of the type examined in this section. The problem is 
that of finding x e D, D a subspace of H, such that j(x) < j(x) 
for all x e D, where J is a differentiable function. Suppose 
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we let P denote the projection operator of. H onto D and we define 
the hounded linear operator from H into H by, C = I - P where I 
the identity operator on H, then the problem can be seen as that of 
minimizing J subject to Cx = 0. Therefore, the problem of section 1 
is a special class of those problems considered in this section. Hence 
theorem 5.1 follows from 5 . 5 . 
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6. APPLICATION TO OPTIMAL CONTROL THEORY 

In this chapter the results of the first five chapters are used 
to develop a method of computing the solution of various types of optimal 
control problems. We shall consider fixed-time problems since by a . 
simple transformation Q0 the free-time problem can be transformed into 
a fixed-time problem. Moreover, Horwitz and Sarachik £ 21 ] have given 
several other schemes for solving the free-time problem using fixed- 
time techniques, and these schemes are applicable when the basic 
algorithm, outlined in chapter 2, is used. Also Leondes and Niemann 
O] have proposed a computational scheme for handling the free-time 
problem by using fixed-time techniques. 

6.1 A Quadratic Payoff With Linear Constraining 
Differential Equations 

From the class k r jjt 0 ,tq~j we wish to find that function u*(t) 
which minimizes 

J [u] = \ J' <x T (t)P(t)x(t) + u*(t)R(t)u(t)}dt (l) 

t c 

subject to the constraints 


x(t) = G(t) x(t) + B(t)u(t) 


( 2 ) 


and x(t Q ) = x Q . where x 0 , t Q , and tq are fixed. 



Hereby; x is an n -vector. 


u is an r-vector, 

lr" 

G (t) is an n X n matrix with components in L jVVj* 

B(t) is an n X r matrix With components in L 1 [t 0 ,t]J, 
and hounded, 

p(t) is an n X n symmetric, positive semi -definite matrix 
the components of which are piece -wise continuous on 
[t 0 ,t-g, and 

R(t) is an r x r symmetric uniformly positive definite 
matrix the components of which are piece-wise 
continuous on Q vh]- 

Horwitz and Sarachik Q20j have shown that this problem can 
he considered as that of finding the location of the minimum of a 
quadratic functional on L^jjb Q , t J . This can he seen hy defining the 
following linear operators : 


^Vi] - *l[V<a] 

2 2 
where for y e I^(t 0 ,t f ], z e 


( 3 ) 
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(Py)(t) = P(t)y(t) 


(Ez)(t) = R(t)z(t) 


(Ey)(t) = ®(t,t Q )y(t) 


(Fz)(t) = J $(t_,T)B(T)z(r)dT 


where 4 = with $(t 0 ,t 0 ) = I. 

2 

It is well known that for any u e x = + 

so that (l) "becomes 

J[u] = | <Ex 0 + Fu,P(Ex q + Fu)> 

+ | <^Ku> 

p -m « 

where <> is the usual inner product defined on L r 


Hence, 


J H = |< Ex o^ PEx o> + §<^u,F*fEx£> 

|<(PF)*Ex 0 ,£> + |<\l,(F*PF + R)u^> 


If we let 
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J = |<^Ex 0 _,PEx 0 ^> 

w = F*PEx Q , (7) 

A = F*PF + R, 

(6) becomes 

j[uJ = J D + <^w,u^> + |<^u,At£>. (8) 

Moreover, since P is positive semi -definite and R is uniformly 

positive definite, A is a strongly positive linear operator. Hence, 

J 0] as given by (8) is a quadratic functional on the real Hilbert 
2 — ^ 

space Lrl^o-^lJ of discussed in chapter l. By (1.7) the 

gradient of J is given by 

g(u) = Au + w (9) 

Moreover, this is exactly the type of function for which the conditions 
given in theorems 2.10, 2.11, and the corollary to theorem 4.1 guarantee 
the convergence of the various modifications of the basic algorithm. 

Note that if we wish to find the location of the minimum of (l) 
subject to (2) but with x(t Q ) - x Q as initial condition, we can 
repeat the definitions given in (3) and (7) ■ Then the vector w and 

A; A/ 

the scalar J 0 defined by (7) are changed to w and J Q , say. 

However, the operator A also defined by (7) is unchanged. 

From theorem 1.2 , the location of the minimum of the resulting 
quadratic functional J(V"| = J Q +<^w,u^>+ i<^u,Au^>is given by 
-A~^w. Since the operator V^ 11 ) which we computed when solving 
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for the minimum of (8), converges pointwise to A** 1 , "by theorem 2.8, 
we can use this as our new initial estimate of A” 1 . In this 

fashion we can accelerate the convergence of the iterates for the 
second .problem, that is, of computing the location of the minimum of 

r\j 

J. 


6.2 General Optimal Control Problems and the 
Gradient of the Payoff 

In this section we shall describe a class of problems generally 
referred to as optimal control problems [* 29 ] or in the Calculus of 
Variations as Lagrange Problems £ 33 ] • Also we shall show how to 
apply the algorithms discussed in chapters 2 and 7 to compute solu- 
tions to these problems. 

Suppose we have a system of n differential equations 


x(t) = f(x,u,t) 


( 10 ) 


with x(t Q ) = x Q and u e R r . We wish to choose a function 

~ f' t l 

u = u(t) which minimizes the value of / L(x(t),u(t),t)dt. 

*0 

We shall assume that f(x,u,t) and L(x,u,t) have continuous partial 
derivatives of at least second order in x and u and piecewise 
continuous in t. Also, we shall assume that there are no constraints 
on u or x, other than x must satisfy (10) . 

Moreover, we shall assume that L and f are such that corres- 
ponding to every u = u(t) e Qb'^l]' a real Hilbert space, there 

r 

exists a solution, x = x(t), of (10) and that for this x and u 



ffc]_ 

the integral, / L(x(t),u(t), t )dt, exists., By a solution to (10) we 

J l 
^o 

mean, as is the usual case in ordinary differential equations, an 
absolutely continuous function cp = qp(t) such that cp(t D ) = x° and 
q)(t) = f(cp(t),u(t), t) almost everywhere for. some u = u(t) . By 
the continuity conditions on L and f, if we can restrict our attention 
to a compact subset of (t,x) space for all u, then standard results of 
differential equations theory concerning existence and uniqueness of 
solutions hold l6> 17.? . An assumption on f and L which 

guarantees this is to assume that there exists C, a scalar, such that 
for all t e [t 0 ,tij, x, and u 


(x,f(x,u,t)) < G 


|<0 

1 + |x | 2 


- 


( 11 ) 


where f and x denote the vectors (L,f) and (x Q ,x) respectively, 
with x Q = L(x,u,t) and x 0 (t Q ) = 0. This implies (x,x) < C 


so that 


c (t ) 


| 2 < 1 


1 + x" 


2CU 


[x + p |a] 


The above inequality is shown by 


Hermes and LaSalle in jjL6^j . Hence, we can define the functional J:H — > R 




J[u] = J 1 L(x(t),u(t),t)dt 


where x(t) is a solution of (10) corresponding to u. 

Therefore, our problem appears to be that of locating the minimum of 
a functional J on a real Hilbert space H. In order to apply the 
algorithms discussed in chapters 2 and 4, we must compute the gradient 



of J. The gradient of J is that part of jjji 4 6 u] - j[u] ' "which is 
linear in 8 U-. From ( 10 ) we have 


(t) = x° 4 J' f (x(t),u(t) 4 * S"Uj f)dt 


p t 

x(t) = x° + J f(x(r),u(T),T)dT 


Therefore, 


( 12 ) 


x(t) — x(t) = J {f(x(T) jU (r) 4 5u,t) - f (x(t),u(t), - 0 }dt. 
t o 

If we let 6 x denote the linear part of x(t) - x(t), then 

Sx = f x Sx + f u Su (13) 

r 

with 5 x(t 0 ) = 0 where f x denotes the n x n matrix ~ and 
Sf Sx 

f VL ~ si an . n x r matrix, evaluated at (x(t),u(t),t) . 

Moreover, 

J[u 4 Su] - j[u) = J "(L(x,u 4 Su,t) - L(x,u,t'))dt (it) 
- • 'to 

and if we let BJ denote that portion of (it) which is linear in 
Bx and Su, then 


BJ 



*1 

L x Bx 4 I^Su dt 


where L^. denotes and 




Lu = ^ evaluated at (x(t),u(t),t) . 


( 15 ) 
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We then let A(t) "be an n -vector valued function satisfying 


A(t) = -f x A - 1 ^ 


(16) 


•with A(t]_) = 0. Then we have from (l 6 ) 




d(A Sx) -m m. • 

= A X 5x + A i (5x) 


dt 


= -A T (f x 6 x) - Ljj-Sx + A T f x Sx + A T f u 8 u. 
So that integrating ( 17 ) from t Q to tj_ we have 


(17) 


/ tf r> t^ „ 

L x Sx dt + / A i f u 5u dt 

+ - J t f 


and since A(t^) = 0 = Bx(t c ), we have 


rH r 

J Lx5x dt = j 


^1 rp 

A f u 5u dt. 
to 


(18) 


So, substituting ( 18 ) in ( 15 ), we get 


SJ 


■ J + L u) 5u dt - 


Hence, the gradient of J is given by 


g( u ) '= ^-(x(t),u(t),t) + (x(t),u(t),t)A(t) 


rVn 


(19) 
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■where A given by (l6) can he thought of as an integrating factor for 

( 13 ). [2 5 , 35]. 

It is seen from ( 19 ) that if we define the Hamiltonian to be 

H(x,A,t,u) = L(x,u,t) + A T f(x,u,t) (20) 

Then the gradient of J at u is given by 

VH = (Lu(x(t),u(t),t) + A T (t)f u (x(t),u(t),t)) T 
where ■ (2l) 

x(t) = f(x(t),u(t),t) = |5,x(to) = x° 

. Sh . . 

A(t) = ~ S #^(ti) = 0 

The computational steps necessary to compute the gradient of J 
at u = u Q (t) are : 1 integrate x = f(x,u Q ,t) with x(t 0 ) = x 0 
forward to t = t^, then at t = t^ we integrate 

. j ip 

A = -f x (x,u 0 ,t)A - Lx(x,u 0 ,t) 

with A(t-^)’ - 0 backward to t = t Q . Therefore, we can then compute 
the gradient as given in ( 19 ) using the control u = u Q (t) and the 
values of x(t) and A(t) computed above. If the gradient is computed 
according to ( 19 ), then B'' n ^ and r n can be computed as in (2,10) and 
( 2 . 5 ) by following the algorithms outlined in chapters 2 and 4. Hence, 
these algorithms can be used directly to compute the optimal control. 
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6.5 Computing the Optimal Control for a Problem With a Compact , 

Convex Control Region Via the Algorithm of Chapter 2 
The problem considered, in section 2 of this chapter, which we shall 
call the first problem, is to find a function u such that 

L(x(t),u(t),t)dt -» min. (22) 

subject to x = f(x,u,t) and x(t Q ) = x q . This problem is not entirely 
typical of optimal control problems in that the range of u is unrestricte< 
For a large class of those problems generally considered to be opt ima l con- 
trol problems, the function u is a member of I/*Tt -t-.'l an d has its 
range in some subset U of R r . U is called the control region of the 
problem 2$T] . For (22), we assume that f and L are as section 6.2 
except (ll) holds for f at every x,t and u e U. 

Problems for which U is a convex, compact subset of R r and which 
can be transformed into control problems with no spacial restriction on 
U, were examined by Park [^28j . He showed that an optimal control problem 
as (22) for which U is a convex and compact subset of a Euclidean space 
can be transformed into an "equivalent" problem with its associated 
control region - a Euclidean space of dimension p. Hence, the new 
control variables have no restriction on their range. We shall see that 
this "equivalent” problem can be seen as that of locating the tnim'irmm 
value of a functional defined on a Hilbert space. The algorithms which 
we have previously discussed can be used to compute the location of the 
minimum of this functional and the results then can be transformed back 
to the original problem. 
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A problem of the type investigated by Park is to find an L^jt n ,t- 1 "| 
function u = u(t) with range in U C E r , U a compact convex set, 
such that (22) holds. Let 

f:R P -»R r ( 23 ) 

be a map of the type discussed by Park, that is, t is continuous, 
onto U and there exists a compact subset Z of R^ such that 

t(Z) = -U (24) 

By .Filippov' s Lemma OO for every admissible control u, that is, 
u e with range in U, there exists a bounded measurable 

function z -» Z such that for every t, 

u(t ) = t * z-(t) (25) 


Let us suppose that the problem to be solved is as in (22) where 
u 6 Lrfo^l] has its range in U, a compact, convex subset of R r . 

Let f and Z be as in (23) and (24). The "equivalent" problem which 

t^^J RP such that 


we will call problem 2 then is to find y:[t 


o' 


r t 1 

J L(x(t),t(y(t)),t )dt -> min 


(26) 


subject to x = f (x(t), i(r(y(t),t) and x(t Q ) = x Q where y e L^[t 0 ,tJ. 

In problem 2, we are minimizing over the Hilbert space LpQfc 0 ,t£j, 
not a subset of 1^0^, t-J as in problem 1. This follows because for 
every y e iff o> n] y is measurable, and since f is a given 
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continuous function, i|r . y is measurable. Moreover t has a bounded 
range U, hence, f.y is bounded and measurable on 0 or * 0 - Therefore, 
for any y e L^[t 0 ,t-j^ y is an admissible control, that is, 
LrJjfcorti^j with range in U. Conversely, for any admissible control 
u, the corresponding y given by Filippov's Lemma is measurable and has 

its range in Z, a compact set. Hence, y is bounded. [Therefore, 

2 

y e Lpft 0 ,t-J. Hence we see that the space of admissible controls for 
problem 2 is all of L^|t 0 ,t 1 '|, a Hilbert space, whereas the "equivalent" 
problem 1 had as its admissible controls a subset of L r jt 0 ,t3~J. 

Note that if the transformation ^ given in (23) has continuous 
derivatives of second order, then problem 2 is of the type discussed in 
section 2 of this chapter. Hence, the computation of the location of 
the minimum can be carried out by the algorithms given in chapters 2 
and k, and the gradient of the functional to be minimized in problem 2 
is given by 


g(y(t))= (X T (t)f u (x(t),t(y(t)),t) * y (y(t)) 
+ L u (x(t),t(y(t)),t)f y (y(t))) T , 


(27) 


where A(t) = -f^(x(t), t(y(t)),f )A(t) - L^(x(t),f(y(t) ),t) . This 
gradient is found by applying to problem 2 the same techniques used to 
get (19). 

Hence this transformation technique can be useful in computing the 
solution to a wide class of optimal control problems. It can also be used 
to apply the classical calculus of variations results to various types of 
optimal control problems [15, 28^] . 
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6.k Optimal Control Problems With End-Point Constraints • 

Suppose we wish to solve the problem posed in section 2 of this 
chapter as outlined in (19) subject to the additional constraint that 
some of the components of x(t^) are to be fixed numbers. That is, 
suppose the first q-components of x(t^_) are to be such that x 0 (t-^)- = x^ 
for i = 1, 2, ...,q where Xj_ are given scalars. 

One approach to computing the solution to this problem would be a 
"penalty function" technique 03 • This technique is the following: use 

any admissible control u = u(t), integrate x = f(x(t), u(t),t) from x Q 
at t Q to t 1 . At tj the components of x will probably not be the 
prescribed values x±, i = 1,2, ...,q, so we will compute 


- x ± = A i [u],i = 1,2, . . .,q. 

AiH is the error in the ith component of x(t^) corresponding to the 
control u = u(t) . Then for an arbitrary but fixed set of positive scalars 
k-j_, kg, . . . ,kg, we compute the penalty associated with u as follows: 


<3. 

Y, k i( A i[ u ]) 2 * (28) 

i<L 

The functional of u which we seek to minimize by our algorithm is 



^o 


where P GO is given in (28). 
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It can be shown by analysis similar to that of section 2 of this 
chapter that i[u], the gradient of J in (29),. is given by 

g(u] = (A T f u (x(t),u(t),t) + L u (x(t),u(t),t)) T (30) 

where 

m ijjl J 

A(t) = f (x(t),u(t) , t)A(t) - L (x(t),u(t),t),A(t 1 ) = 

x ' dx t=t 1 

and x(t) = f(x(t),u(t),t). 

While this technique appears to handle the problem of the end 

constraints very nicely, we are left with the problem, of choosing the 

k.'s. Due to the finite number of significant figures on a digital 

computer, if the k^s are too large the algorithm will try to satisfy 

r 

the end conditions at the expense of minimizing / L(x,u,t)dt, and 

J t c 

if the k^'s are too small the algorithm may not be sensitive to viola- 
tions of the end constraints . In some cases, Lasdon et al. 03]] 
have remarked that the penalty function terms in ( 29 ) may "create a 
steep-sided valley in the control space." This would slow the conver- 
gence of the algorithm. 

Another possible method of computing the optimal control for a 
problem with end-point constraints is the projection method. This 
technique is discussed by Rosen jj52^J ,Sinnott jj5*0 , 8Jir 3- Luehberger [j25j 
for various algorithms. The adaptation of this technique to our algorithm 
appears to be rather straightforward, but we shall not pursue it here. 
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In the next section we shall examine the optimal control problem with 


end-point constraints for the case where the state differential equations 
are linear in the control. 


6.5 Optimal Control Problems With Linear Constraining 
Equations and End Conditions of the Type Kx(t-] ) = d 
Suppose our qptimal control problem is to find that L^{t 0 ,tj/j 
function u = u(t) which minimizes 


with 



L(x(t),u(t),t)dt 


(3D 


x(t) = G(t)x(t) + F(t)u(t) 
x(to) = x° 

and tj_ is fixed with Kx^t^) = d. 

We assume that L has continuous partial derivatives of at least 

t t 

second order in its x 'and u arguments and is piecewise continuous 
in t. G and F are matrix valued functions with components 

and continuous components respectively. K is a q X n matrix of scalars 
and d is a q vector of scalara where q < n. Moreover, we assume 
that L is such that for any u e L“ [t 0 , tqj and its corresponding 
x = x(t) (31) exists. 

If we denote the principle matrix solution of the homogeneous system 
x(t) = G(t)x(t) by <£(t,t 0 ) where <£(t o ,t 0 ) = I then the state vector 
x corresponding to any admissible control u is given by 



Hence , by (32) we have for any admissible control u = u(t)_, 


= KO(t 1 ,t 0 )x° + J K$(t 1 ,s)F(s)u(s)ds. (33) 


In order to satisfy Kbc(tjL) = d, we see from (33) that 



K$(t,s)F(s)u(s)ds = 




where o = d - K$ (t,t 0 )x° is a fixed q vector of scalars. 

If we define a linear operator C from, the space of admissible 
controls into the Hilbert space such that 


Cu = 


pt i 

/ K^t-^t )F(t)u(t)dt, 


(35) 


u must satisfy 


Cu = o> 


(36) 


in order to satisfy Kx^t^) = d. It is known that if 



F T (s)$(s J t 1 )K T Ka>(t 1 ,t)F(t) 


dsdt < 


CO 


(37) 


then C is a continuous linear operator. Since the components of K 
are scalars and $(t^,t) and F(t) have continuous components on 
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(to^i), it follows that the components of K$(t x ,t) F(t) are 
hounded, hence (57) holds. 

Hence we know that C as given by (35) is bounded. Then our 
optimal control problem as given by ( 31 ) becomes: from the set of 

admissible controls which satisfy Cu = given in (3*0, find that 
control which minimizes 


J 



L(x(t;u),u(t),t)dt 


where x(t;u) is given by ( 32 ). That is, we wish to minimize the 
differentiable function jJjuQ subject to the equality constraint 
Cu = it for the bounded linear operator C given by (35)» In 
section 2 of chapter 5 , this type of problem was examined and the 
application of the basic algorithm to compute the solution was 


explained. 



7. AN EXAMPLE, CONCLUSION, RECOMMENDATIONS, 
AND SUMMARY 
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7 . 1 Example 

In order to exhibit the convergence characteristics of the algorithm, 

we for ma l 1 y applied the procedures of chapter 2 to a sample optimal con- 
trol problem which others have used to display convergence characteristics 
of other algorithms [23, 3I1-, 36^) . The problem is the following: Find 

the function u = u(t) which minimizes 


J = 



( 1 ) 


subject to constraining differential equations described by the 
Van der Pol equation 050 with 6=1, that is 


X 1 = x 2 


X, 


= ~ X 1 +( 1 “ X 1 x 2 + u 


with initial conditions 


X;L (o)= 3.0 

x 2 (0) ss 0.0. 


( 2 ) 


By (6.19) the gradient g of J at u is given by 


where 


g(t) = 2u(t) + X 2 (t) 


(3) 
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A-]_ - (l + 2 x 1 x 2 )A 2 - 2x-j_ 
Ag = -Aj - (l - x^)A 2 -2 x £ 


(*) 


with 


A x (5) =0.0 


a 2 (5) = 0.0 


In order to compute the gradient g(t) of J at some u = u(t), 
we integrate (2) forward to t = 5*0 using u = u(t). Next, (4) is 
integrated from t ~ 5*0 back to t = 0.0. Then using u = u(t) and 
the computed value of A 2- , we can compute g(t) given by (3). 

Figures 1 and 2 depict the progress toward the minimum of J using 
the algorithm outlined in chapter 2 with four different methods of choosing 
a . These four methods of choosing are: 

Method 1: - 1 - (n^ + 2)" 1 / 2 for all n 

Method 2: = 1 for all n 

Method 3: a- n = min^(-j(u n ) + J 0 )/( s n^ g n ),1.0j where J 0 is 

the estimated minimum value of J, s n is defined by 
(2.2) and is the gradient of J at u = u^t). 

Method is the minimum with respect to a of j(x n + as^) 

as computed by bavidon' s one dimensional cubic 
minimization method [»]• 

Methods 1 and 2 of choosing satisfy the condition that 

(l - ct rl )n -» 0 as 


n -» co 


given in theorem 2.11. As chosen "by method 
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% is a rough estimate of the minimum of J along the line 
Ujj + s Q . The form of cc^ for method 5 follows by considering 

<v 

J Q = J(u n ) + cG Q (s n ,g n ) + h. o.t., dropping the higher order terms, and 
solving for a . 

Hot ice that methods 1, 2, and 3 of choosing involve no extra 

functional and gradient evaluations. That is, for each iteration we 
must integrate (2) and (4) only once. For the fourth method of choosing 
Ofc, although the one dimensional minimum is computed more accurately 
than by method 3, the fourth method involves at least one more functional 
evaluation per iteration. Hence, with the fourth method of choosing 
we have at least two functional and gradient evaluations per iteration. 

In Figure 1, we have plotted J (u^) versus n (i.e., the itera- 
tion number) for the four different methods of choosing a^. Figure 1 
shows that the fastest convergence in terms of iterations is achieved 
by the algorithm with <x n chosen by method 4 . Also, Figure 1 shows 
that after 12 iterations, all the methods have converged. Moreover, 
after eight iterations for all methods of choosing the change in 

the value of J is too small to show up in the graph. 

In Figure 2, we have plotted J versus the number of functional 
evaluations. Notice that in Figure 2, methods 3 and 1 converge faster 
with respect. to function evaluations than method 4. Note also that 
after at most eight functional evaluations, the change in J is too 
small to be noticed in the graph. 



Figure 1. J^u^) versus 



Number of iterations 


i for the four methods of choosing 


O Method 1 — - 

□ Method 2 

O Method 3 

A Method 4 


-0 □ d 
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Figure 3 shows the rates of convergence to the minimum for' the . 
example problem for the three first -order methods given in chapter 1, 
These results were reported by Tokumaru, ej; al . } . Note that the 

DFP algorithm shows the fastest rate of convergence 

Using the same initial estimate of u that we used for the results 
shown in Figure 1, we applied the DFP method to the example problem. 

Our results for the DFP method were identical to those of the rank one 
algorithm with chosen by method four. The reduction in the payoff 

and the iterates for the two methods were the same. 



Figure 3. Comparison of first-order methods due to Tokumaru 
j(u£) versus i 
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In Figure \ } we have plotted the values of j(u^ ) versus the 
number of function evaluations for the DFP method and our algorithm 
when is chosen by method J. Notice that in terms of function 

evaluations, our method for this choice of a ^ converges faster 
than the other algorithm. The linear minimizations for the DFP 
algorithm were carried out by method k. This method was chosen 
because high accuracy for the linear minimization is necessary for 
the DFP method. 



Figure Comparison of Davidon-Fletcher-Powell method and Rank 
One method with chosen by the 'third method with 

J(ui) versus function evaluations 
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In Figure 5^ we have plotted the iterates of the control Uj_(t) 
for i = 0,1,2, 3 of our algorithm. The integrations of (2) and (4) 
were carried out by the Adams -Bashforth predictor and Adams -Moulton 
corrector method on a CDC 6000 series computer with step size of 
0.03125. 



Figure 5. v.± versus t for i = 0,1, 2,3 generated by Rank 

One algorithm with chosen by method h 

7-2 Conclusion 

The algorithm outlined in chapter 2, when applied to compute the 
location of the minimum of a quadratic functional, has several attract- 
ive properties. Theorem 3-3 shows that if is chosen by (2.17), 



93 


then our algorithm, the DFP and conjugate gradient methods generate 
the same iterates. Hence, the methods will have the same rates of 
convergence if the hypothesis of theorem 5*3 hold. Moreover, by 
theorem 2.8 A - "'" pointwise where is given hy (2.9) 

and A is given hy (l.j). This property can he used to accelerate, 
the convergence when many solutions corresponding to different initial 
conditions are desired. This was discussed in section 1 of chapter 6. 
This property is not available to the method of conjugate gradients. 
Theorem 3.3 shows that if a n is chosen hy the fourth method, then our 
algorithm, the DFP, and the conjugate gradient methods generate the 
same iterates, hence, the same rates of convergence. Moreover, our 
algorithm requires one-half the storage necessary for the DFP method. 
Also, it requires the computation of one operator per iteration versus 
the computation of two operators per DFP iteration. 

The results of the example problem show that the algorithm can he 
applied with success when a.^ is chosen in a variety of ways. It 
appears that method 3 of choosing is best when the functional to 

he evaluated is very complex, its computation is time-consuming, and 
storage- considerations are not as important. If storage considerations 
are pressing and the computation of the functional is not as time- 
consuming, then method 4 would seem to he the best choice for ‘a n . 

7-3 Rec ommendat ions ■ 

Possible research topics related to this work are the following: 
(l) Research could he done on the application of the algorithm outlined 
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in chapter 2 to the solution of the singular linear operator equation 


Kx = a. (5) 

Hereby in (5)^ x e H, 'a real Hilbert Space, K:H -» H is linear, bounded 
and has a closed range and d is fixed element of H, another real 
Hilbert Space. Hashed [j5l] has discussed solving this problem using 
the method of steepest descent to compute at least squares solution. So 
it appears that the problem could be solved by our algorithm. By using 
theorem 2.8, perhaps it could be shown that Ar ; K* converges pointwise 
to the generalized inverse of K, In a finite dimensional space this 
could perhaps give another technique for computing the generalized inverse 
of K. (2) Research could be done to extend to an infinite dimensional 
real Hilbert Space the class of first-order algorithms recently proposed 
by Greens tadt jjLhj . 


7 • 4- Summary 

The various elements of the class. of rank one, quasi -Newt on mini- ■ 
mization methods are distinguished by the manner in which a particular 
parameter is chosen at each iteration. In chapter 2, conditions were 
found which guarantee that the rank one, quasi —Newton algorithms generate 
iterates which converge to the location of the minimum of a quadratic 
functional for various choices of this parameter. In chapter 3, the 
iterates of the rank one, quasi -Newton algorithm with the parameter 
chosen by a linear minimization technique are compared with the iterates 
of the Davidon-Fletcher-Powell method and method of conjugate gradients. 
It is found that for a quadratic functional with the hypothesis of 
theorem 3*3 that the iterates of the three methods are the same. In 
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chapter 4, an idea due to Powell is extended to infinite dimensional 
Hilbert spaces. In chapter 5, a- modification of the rank one., quasi- 
Newton method is outlined in order to minimize a functional subject to 
linear constraints. Conditions are found which guarantee the conver- 
gence to the location of the constrained minimu m of a quadratic func- 
tional. The application of these rank one, quasi -Ifewton minimization 
methods to various types of optimal control problems is investigated 

V 1 

in chapter 6. In chapter 7, the rank one, quasi-Newton methods are 

* * t 

applied to a sample optimal control problem. 4 The, results are compared 

, t 

with the results of other known first-order minimization techniques 
for the same sample problem. This comparison is in terms .of speed of 
convergence with respect to iterations and number of functional evalua- 
tions. The rank one, quasi-Newton algorithms are shown to be superior. 
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